It’s been a while since my last post – things have been very busy! I’ve just moved across to the Coherence Development team and it’s definitely an exciting time to join. After I’ve settled in, I look forward to doing some more regular posts. On to the post…
Overview
Many customers I’ve worked with want to run a single cluster across multiple data centers to provide DR capabilities. In the case where the data centers are connected by relatively slow networks, it is much better to have two separate clusters and connect them via Coherence*Extend. There are patterns on the Incubator site such as the push-replicate pattern, which shows how to replicate data between sites in this configuration.
But many of these data centers are now connected via 10Gb or higher and customers are asking “Why can’t we have a single cluster across both?” Doing this is possible, but not recommended unless latencies are extremely low due to the potential effect of a slower link on the entire cluster. The detailed discussion on this is for another day as there are many other factors to consider, but it is possible under the right conditions.
Distributed Cache Diversion
Before we get into more detail, just a diversion to talk about how partitioning of data works in a distributed cache. (This will help us understand the end result we are trying to achieve.) For a Distributed cache, the data is evenly distributed across all the available members using a common hashing algorithm. Where possible, without affecting the overall balance of data, backup and primary copies of data reside on separate physical machines for data reliability. When the Service, which contains the cache, has data with all backups and primary copies on physically separate machines, it is known as machine-safe. In this state the Service can survive the loss of an entire machine without data loss.
See here for more information on distributed caches.
Taking this further, what about if you have multiple racks within a data centre and a Coherence cluster spans these? What about multiple sites? How do we get rack-safe, or site-safe? Prior to 3.7.1 the site or rack a cache server resided on was not taken into account when making these backup decisions, just the machines. In 3.7.1, with the new Simple Partition Assignment Strategy, Coherence takes into consideration the entire topology of the cluster from machine to rack to site when backing up data.
So now it is possible to not only have clusters that are machine-safe, they can be rack-safe and site-safe as well!
Back to the example
The only viable solution before 3.7.1, for achieving a “site-safe” cluster, was to set the machine-id manually to force coherence to consider the two sites being 2 machines, e.g. Backup across so-called “machines”. That’s a reasonable approach, but if we lose a site then the cluster becomes only node-safe, not machine-safe because Coherence thinks it has only one machine, so loss of a physical machine could cause data loss.
In the diagram below to achieve this, it would be done by setting the following:
-Dtangosol.coherence.machine=siteA (for all machines on Site A) -Dtangosol.coherence.machine=siteB (for all machines on Site B)
Using the Simple Partition Assignment Strategy
With the new Simple Partition Assignment Strategy in 3.7.1, and with the above cluster setup, as long as you set the site name using
–Dtangosol.coherence.site=siteA and SiteB, or using the appropriate override, you will be able to achieve a site-safe configuration. E.g. you could lose an entire site at once, and you would not lose data. Similarly if you have multiple racks in your configuration, as long as you identify them via the –Dtangosol.coherence.rack setting, you can achieve a rack-safe configuration. E.g. you could lose an entire rack you would not lose data.
Demonstration
To demonstrate this I’m using part of the Coherence Incubator functionality, which allows you to easily startup/shutdown multiple cache servers, either in process or as separate processes. I’ve built a wrapper around this and made a simple command line utility that allows me to dynamically specify a machine, rack and site before starting up cache servers.
I’ll provide the details of the code below, but in my cache-configuration, all I have to do to enabled this for a service, is to set the partition-assignment-strategy.
<?xml version="1.0"?>
<cache-config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://xmlns.oracle.com/coherence/coherence-cache-config"
xsi:schemaLocation="http://xmlns.oracle.com/coherence/coherence-cache-config http://xmlns.oracle.com/coherence/coherence-cache-config/1.1/coherence-cache-config.xsd">
<caching-scheme-mapping>
<cache-mapping>
<cache-name>*</cache-name>
<scheme-name>example-distributed</scheme-name>
</cache-mapping>
</caching-scheme-mapping>
<caching-schemes>
<distributed-scheme>
<scheme-name>example-distributed</scheme-name>
<service-name>DistributedCache</service-name>
<thread-count>5</thread-count>
<partition-count>1049</partition-count>
<partition-assignment-strategy>
<instance>
<class-name>com.tangosol.net.partition.SimpleAssignmentStrategy</class-name>
</instance>
</partition-assignment-strategy>
<backing-map-scheme>
<local-scheme>
<unit-calculator>BINARY</unit-calculator>
</local-scheme>
</backing-map-scheme>
<autostart>true</autostart>
</distributed-scheme>
</caching-schemes>
</cache-config>
Rack-safe Configuration
Consider the example below where you have a single site with 2 racks with 2 servers each for simplicity. Each server will have 2 cache servers running.
Running my utility (which I’ll show below) and passing the IP address of my machine for WKA configuration, I can create this setup using set rack and set machine commands which will set the tangosol.coherence.rack and tangosol.cohernece.machine system properties before starting the cache server(s).
$ ./run.sh 192.168.88.1
Oracle Coherence Version 3.7.1.3 Build 31790
Grid Edition: Development mode
Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.
Using the Incubator Extensible Environment for Coherence Cache Configuration
Copyright (c) 2011, Oracle Corporation. All Rights Reserved.
Type help for help or quit to exit.
Command{machine=,rack=,site=}: set rack rack1
Command{machine=,rack=rack1,site=}: set machine server1
Command{machine=server1,rack=rack1,site=}: start 2
Command{machine=server1,rack=rack1,site=}:
[cacheserver2:4252] 1:
[cacheserver2:4252] 2: Oracle Coherence Version 3.7.1.3 Build 31790
[cacheserver2:4252] 3: Grid Edition: Development mode
[cacheserver2:4252] 4: Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.
[cacheserver2:4252] 5:
[cacheserver1:4251] 1:
[cacheserver1:4251] 2: Oracle Coherence Version 3.7.1.3 Build 31790
[cacheserver1:4251] 3: Grid Edition: Development mode
[cacheserver1:4251] 4: Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.
[cacheserver1:4251] 5:
[cacheserver2:4252] 6:
[cacheserver2:4252] 7: Using the Incubator Extensible Environment for Coherence Cache Configuration
[cacheserver2:4252] 8: Copyright (c) 2011, Oracle Corporation. All Rights Reserved.
[cacheserver2:4252] 9:
[cacheserver1:4251] 6:
[cacheserver1:4251] 7: Using the Incubator Extensible Environment for Coherence Cache Configuration
[cacheserver1:4251] 8: Copyright (c) 2011, Oracle Corporation. All Rights Reserved.
[cacheserver1:4251] 9:
Please enter a command.
Command{machine=server1,rack=rack1,site=}: set machine server2
Command{machine=server2,rack=rack1,site=}: start 2
Command{machine=server2,rack=rack1,site=}:
[cacheserver3:4257] 1:
[cacheserver3:4257] 2: Oracle Coherence Version 3.7.1.3 Build 31790
[cacheserver3:4257] 3: Grid Edition: Development mode
[cacheserver3:4257] 4: Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.
[cacheserver3:4257] 5:
[cacheserver4:4258] 1:
[cacheserver4:4258] 2: Oracle Coherence Version 3.7.1.3 Build 31790
[cacheserver4:4258] 3: Grid Edition: Development mode
[cacheserver4:4258] 4: Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.
[cacheserver4:4258] 5:
[cacheserver3:4257] 6:
[cacheserver3:4257] 7: Using the Incubator Extensible Environment for Coherence Cache Configuration
[cacheserver3:4257] 8: Copyright (c) 2011, Oracle Corporation. All Rights Reserved.
[cacheserver3:4257] 9:
[cacheserver4:4258] 6:
[cacheserver4:4258] 7: Using the Incubator Extensible Environment for Coherence Cache Configuration
[cacheserver4:4258] 8: Copyright (c) 2011, Oracle Corporation. All Rights Reserved.
[cacheserver4:4258] 9:
Please enter a command.
Command{machine=server2,rack=rack1,site=}: show
Partition Count: 1049, Unowned: 0
Name PID Machine Rack Site Partitions
============= ========== ============== ============== ============== ==========
cacheserver3 4257 server2 rack1 null
cacheserver4 4258 server2 rack1 null
cacheserver1 4251 server1 rack1 524
cacheserver2 4252 server1 rack1 525
StatusHA is NODE-SAFE
Not machine-safe yet because partitions still being transfered.
Command{machine=server2,rack=rack1,site=}: show
Partition Count: 1049, Unowned: 0
Name PID Machine Rack Site Partitions
============= ========== ============== ============== ============== ==========
cacheserver3 4257 server2 rack1 262
cacheserver4 4258 server2 rack1 262
cacheserver1 4251 server1 rack1 262
cacheserver2 4252 server1 rack1 263
StatusHA is MACHINE-SAFE
Machine-safe now, so add new machines in a different rack.
Command{machine=server2,rack=rack1,site=}: set rack rack2
Command{machine=server2,rack=rack2,site=}: set machine server3
Command{machine=server3,rack=rack2,site=}: start 2
Command{machine=server3,rack=rack2,site=}:
[cacheserver5:4270] 1:
[cacheserver5:4270] 2: Oracle Coherence Version 3.7.1.3 Build 31790
[cacheserver5:4270] 3: Grid Edition: Development mode
[cacheserver5:4270] 4: Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.
[cacheserver5:4270] 5:
[cacheserver6:4271] 1:
[cacheserver6:4271] 2: Oracle Coherence Version 3.7.1.3 Build 31790
[cacheserver6:4271] 3: Grid Edition: Development mode
[cacheserver6:4271] 4: Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.
[cacheserver6:4271] 5:
[cacheserver5:4270] 6:
[cacheserver5:4270] 7: Using the Incubator Extensible Environment for Coherence Cache Configuration
[cacheserver5:4270] 8: Copyright (c) 2011, Oracle Corporation. All Rights Reserved.
[cacheserver5:4270] 9:
[cacheserver6:4271] 6:
[cacheserver6:4271] 7: Using the Incubator Extensible Environment for Coherence Cache Configuration
[cacheserver6:4271] 8: Copyright (c) 2011, Oracle Corporation. All Rights Reserved.
[cacheserver6:4271] 9:
Please enter a command.
Command{machine=server3,rack=rack2,site=}: show
Partition Count: 1049, Unowned: 0
Name PID Machine Rack Site Partitions
============= ========== ============== ============== ============== ==========
cacheserver3 4257 server2 rack1 262
cacheserver5 4270 server3 rack2 null
cacheserver4 4258 server2 rack1 262
cacheserver1 4251 server1 rack1 262
cacheserver6 4271 server3 rack2 null
cacheserver2 4252 server1 rack1 263
StatusHA is MACHINE-SAFE
Command{machine=server3,rack=rack2,site=}: show
Partition Count: 1049, Unowned: 0
Name PID Machine Rack Site Partitions
============= ========== ============== ============== ============== ==========
cacheserver3 4257 server2 rack1 209
cacheserver5 4270 server3 rack2 210
cacheserver4 4258 server2 rack1 210
cacheserver1 4251 server1 rack1 210
cacheserver6 4271 server3 rack2 null
cacheserver2 4252 server1 rack1 210
StatusHA is MACHINE-SAFE
Command{machine=server3,rack=rack3,site=}: set rack rack2
Command{machine=server3,rack=rack2,site=}: set machine server4
Command{machine=server4,rack=rack2,site=}: start 2
Command{machine=server4,rack=rack2,site=}:
[cacheserver8:4283] 1:
[cacheserver8:4283] 2: Oracle Coherence Version 3.7.1.3 Build 31790
[cacheserver8:4283] 3: Grid Edition: Development mode
[cacheserver8:4283] 4: Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.
[cacheserver8:4283] 5:
[cacheserver7:4282] 1:
[cacheserver7:4282] 2: Oracle Coherence Version 3.7.1.3 Build 31790
[cacheserver7:4282] 3: Grid Edition: Development mode
[cacheserver7:4282] 4: Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.
[cacheserver7:4282] 5:
[cacheserver8:4283] 6:
[cacheserver8:4283] 7: Using the Incubator Extensible Environment for Coherence Cache Configuration
[cacheserver8:4283] 8: Copyright (c) 2011, Oracle Corporation. All Rights Reserved.
[cacheserver8:4283] 9:
[cacheserver7:4282] 6:
[cacheserver7:4282] 7: Using the Incubator Extensible Environment for Coherence Cache Configuration
[cacheserver7:4282] 8: Copyright (c) 2011, Oracle Corporation. All Rights Reserved.
[cacheserver7:4282] 9:
Please enter a command.
Command{machine=server4,rack=rack2,site=}: show
Partition Count: 1049, Unowned: 0
Name PID Machine Rack Site Partitions
============= ========== ============== ============== ============== ==========
cacheserver3 4257 server2 rack1 150
cacheserver8 4283 server4 rack2 null
cacheserver7 4282 server4 rack2 127
cacheserver5 4270 server3 rack2 173
cacheserver4 4258 server2 rack1 149
cacheserver1 4251 server1 rack1 150
cacheserver6 4271 server3 rack2 150
cacheserver2 4252 server1 rack1 150
StatusHA is MACHINE-SAFE
Still machine-safe as not all partitions are transferred.
Command{machine=server4,rack=rack2,site=}: show
Partition Count: 1049, Unowned: 0
Name PID Machine Rack Site Partitions
============= ========== ============== ============== ============== ==========
cacheserver3 4257 server2 rack1 131
cacheserver8 4283 server4 rack2 131
cacheserver7 4282 server4 rack2 131
cacheserver5 4270 server3 rack2 131
cacheserver4 4258 server2 rack1 132
cacheserver1 4251 server1 rack1 131
cacheserver6 4271 server3 rack2 131
cacheserver2 4252 server1 rack1 131
StatusHA is RACK-SAFE
Command{machine=server4,rack=rack2,site=}:
Now the cluster is rack-safe!
Site-safe Configuration
Now lets look at the case where we have 2 sites with 2 servers on each site (again for simplicity).
./run.sh 192.168.88.1
Oracle Coherence Version 3.7.1.3 Build 31790
Grid Edition: Development mode
Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.
Using the Incubator Extensible Environment for Coherence Cache Configuration
Copyright (c) 2011, Oracle Corporation. All Rights Reserved.
Type help for help or quit to exit.
Command{machine=,rack=,site=}: set site PrimaryDC
Command{machine=,rack=,site=PrimaryDC}: set machine server1
Command{machine=server1,rack=,site=PrimaryDC}: start 2
Command{machine=server1,rack=,site=PrimaryDC}: [cacheserver2:4381] 1:
[cacheserver2:4381] 2: Oracle Coherence Version 3.7.1.3 Build 31790
[cacheserver2:4381] 3: Grid Edition: Development mode
[cacheserver2:4381] 4: Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.
[cacheserver2:4381] 5:
[cacheserver1:4380] 1:
[cacheserver1:4380] 2: Oracle Coherence Version 3.7.1.3 Build 31790
[cacheserver1:4380] 3: Grid Edition: Development mode
[cacheserver1:4380] 4: Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.
[cacheserver1:4380] 5:
[cacheserver2:4381] 6:
[cacheserver2:4381] 7: Using the Incubator Extensible Environment for Coherence Cache Configuration
[cacheserver2:4381] 8: Copyright (c) 2011, Oracle Corporation. All Rights Reserved.
[cacheserver2:4381] 9:
[cacheserver1:4380] 6:
[cacheserver1:4380] 7: Using the Incubator Extensible Environment for Coherence Cache Configuration
[cacheserver1:4380] 8: Copyright (c) 2011, Oracle Corporation. All Rights Reserved.
[cacheserver1:4380] 9:
Please enter a command.
Command{machine=server1,rack=,site=PrimaryDC}: show
Partition Count: 1049, Unowned: 0
Name PID Machine Rack Site Partitions
============= ========== ============== ============== ============== ==========
cacheserver2 4381 server1 PrimaryDC null
cacheserver1 4380 server1 PrimaryDC 1049
StatusHA is ENDANGERED
Command{machine=server1,rack=,site=PrimaryDC}: show
Partition Count: 1049, Unowned: 0
Name PID Machine Rack Site Partitions
============= ========== ============== ============== ============== ==========
cacheserver2 4381 server1 PrimaryDC 524
cacheserver1 4380 server1 PrimaryDC 525
StatusHA is NODE-SAFE
All partitions balanced now.
Command{machine=server1,rack=,site=PrimaryDC}: set machine server2
Command{machine=server2,rack=,site=PrimaryDC}: start 2
Command{machine=server2,rack=,site=PrimaryDC}: [cacheserver4:4389] 1:
[cacheserver4:4389] 2: Oracle Coherence Version 3.7.1.3 Build 31790
[cacheserver4:4389] 3: Grid Edition: Development mode
[cacheserver4:4389] 4: Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.
[cacheserver4:4389] 5:
[cacheserver3:4388] 1:
[cacheserver3:4388] 2: Oracle Coherence Version 3.7.1.3 Build 31790
[cacheserver3:4388] 3: Grid Edition: Development mode
[cacheserver3:4388] 4: Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.
[cacheserver3:4388] 5:
[cacheserver4:4389] 6:
[cacheserver4:4389] 7: Using the Incubator Extensible Environment for Coherence Cache Configuration
[cacheserver4:4389] 8: Copyright (c) 2011, Oracle Corporation. All Rights Reserved.
[cacheserver4:4389] 9:
[cacheserver3:4388] 6:
[cacheserver3:4388] 7: Using the Incubator Extensible Environment for Coherence Cache Configuration
[cacheserver3:4388] 8: Copyright (c) 2011, Oracle Corporation. All Rights Reserved.
[cacheserver3:4388] 9:
Please enter a command.
Command{machine=server2,rack=,site=PrimaryDC}: show
Partition Count: 1049, Unowned: 0
Name PID Machine Rack Site Partitions
============= ========== ============== ============== ============== ==========
cacheserver4 4389 server2 PrimaryDC 350
cacheserver3 4388 server2 PrimaryDC null
cacheserver2 4381 server1 PrimaryDC 350
cacheserver1 4380 server1 PrimaryDC 349
StatusHA is NODE-SAFE
Command{machine=server2,rack=,site=PrimaryDC}: show
Partition Count: 1049, Unowned: 0
Name PID Machine Rack Site Partitions
============= ========== ============== ============== ============== ==========
cacheserver4 4389 server2 PrimaryDC 262
cacheserver3 4388 server2 PrimaryDC 262
cacheserver2 4381 server1 PrimaryDC 262
cacheserver1 4380 server1 PrimaryDC 263
StatusHA is MACHINE-SAFE
Now machine-safe in one single data centre. Lets start up the other data centre.
Command{machine=server2,rack=,site=PrimaryDC}: set site BackupDC
Command{machine=server2,rack=,site=BackupDC}: set machine server3
Command{machine=server3,rack=,site=BackupDC}: start 2
Command{machine=server3,rack=,site=BackupDC}: [cacheserver5:4399] 1:
[cacheserver5:4399] 2: Oracle Coherence Version 3.7.1.3 Build 31790
[cacheserver5:4399] 3: Grid Edition: Development mode
[cacheserver5:4399] 4: Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.
[cacheserver5:4399] 5:
[cacheserver6:4400] 1:
[cacheserver6:4400] 2: Oracle Coherence Version 3.7.1.3 Build 31790
[cacheserver6:4400] 3: Grid Edition: Development mode
[cacheserver6:4400] 4: Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.
[cacheserver6:4400] 5:
[cacheserver5:4399] 6:
[cacheserver5:4399] 7: Using the Incubator Extensible Environment for Coherence Cache Configuration
[cacheserver5:4399] 8: Copyright (c) 2011, Oracle Corporation. All Rights Reserved.
[cacheserver5:4399] 9:
[cacheserver6:4400] 6:
[cacheserver6:4400] 7: Using the Incubator Extensible Environment for Coherence Cache Configuration
[cacheserver6:4400] 8: Copyright (c) 2011, Oracle Corporation. All Rights Reserved.
[cacheserver6:4400] 9:
Please enter a command.
Command{machine=server3,rack=,site=BackupDC}: show
Partition Count: 1049, Unowned: 0
Name PID Machine Rack Site Partitions
============= ========== ============== ============== ============== ==========
cacheserver4 4389 server2 PrimaryDC 175
cacheserver3 4388 server2 PrimaryDC 174
cacheserver6 4400 server3 BackupDC 175
cacheserver2 4381 server1 PrimaryDC 175
cacheserver5 4399 server3 BackupDC 175
cacheserver1 4380 server1 PrimaryDC 175
StatusHA is MACHINE-SAFE
Command{machine=server3,rack=,site=BackupDC}: set machine server4
Command{machine=server4,rack=,site=BackupDC}: start 2
Command{machine=server4,rack=,site=BackupDC}: [cacheserver7:4408] 1:
[cacheserver7:4408] 2: Oracle Coherence Version 3.7.1.3 Build 31790
[cacheserver7:4408] 3: Grid Edition: Development mode
[cacheserver7:4408] 4: Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.
[cacheserver7:4408] 5:
[cacheserver7:4408] 6:
[cacheserver7:4408] 7: Using the Incubator Extensible Environment for Coherence Cache Configuration
[cacheserver7:4408] 8: Copyright (c) 2011, Oracle Corporation. All Rights Reserved.
[cacheserver7:4408] 9:
[cacheserver8:4409] 1:
[cacheserver8:4409] 2: Oracle Coherence Version 3.7.1.3 Build 31790
[cacheserver8:4409] 3: Grid Edition: Development mode
[cacheserver8:4409] 4: Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.
[cacheserver8:4409] 5:
[cacheserver8:4409] 6:
[cacheserver8:4409] 7: Using the Incubator Extensible Environment for Coherence Cache Configuration
[cacheserver8:4409] 8: Copyright (c) 2011, Oracle Corporation. All Rights Reserved.
[cacheserver8:4409] 9:
Please enter a command.
Command{machine=server4,rack=,site=BackupDC}: show
Partition Count: 1049, Unowned: 0
Name PID Machine Rack Site Partitions
============= ========== ============== ============== ============== ==========
cacheserver4 4389 server2 PrimaryDC 150
cacheserver3 4388 server2 PrimaryDC 150
cacheserver8 4409 server4 BackupDC null
cacheserver6 4400 server3 BackupDC 150
cacheserver2 4381 server1 PrimaryDC 150
cacheserver5 4399 server3 BackupDC 149
cacheserver1 4380 server1 PrimaryDC 150
cacheserver7 4408 server4 BackupDC 150
StatusHA is MACHINE-SAFE
Not quite site-safe yet because partitions not al transferred and balanced.
Command{machine=server4,rack=,site=BackupDC}: show
Partition Count: 1049, Unowned: 0
Name PID Machine Rack Site Partitions
============= ========== ============== ============== ============== ==========
cacheserver4 4389 server2 PrimaryDC 131
cacheserver3 4388 server2 PrimaryDC 131
cacheserver8 4409 server4 BackupDC 131
cacheserver6 4400 server3 BackupDC 131
cacheserver2 4381 server1 PrimaryDC 131
cacheserver5 4399 server3 BackupDC 132
cacheserver1 4380 server1 PrimaryDC 131
cacheserver7 4408 server4 BackupDC 131
StatusHA is SITE-SAFE
Now we have a site-safe configuration using the Simple Partition Assignment Strategy!
Closing thoughts
This is a great new feature which has seemed to slip in without too much fanfare. Definitely something that many people have been asking for.
As mentioned early on, running a Coherence cluster across multiple geographically dispersed sites is possible but care should be taken when doing this. Speeds of 10Gb and extremely low latencies are a must, you must also ensure that you test the link using a tool such as the datagram test as well as assessing the impact of cluster traffic on your other cross-site traffic. Other factors, outside the scope of this discussion, should also be considered.
One of my colleagues has also posted about this new partitioning strategy, and has some good advice down the bottom of his post. Worth a read too!
Source Code
You can download the source code for this small example at https://blogs.oracle.com/felcey/resource/PartitionExample.zip. You will also need to download Coherence from OTN and the Coherence Commons package from the Incubator site.
The following Java classes are part of this:
- RunPartitionTest.java – main class with command line utility
- ClusterBuilderHelper.java – helper class to wrap some of the incubator classes
- PartitionHelper.java – helper class to determine statusHA without using JMX



Hi Tim,
Do you happen to know internal details of what happens on the joining node/s when one or more of the nodes in the
coherence cluster is/are restarted while cluster is populated?
We are getting OOM error in jboss server when we are joining the cluster. jboss has local storage turned off.
Thanks
Hi,
When a cache server is shutdown and it holds data, the primary and backups that the cache server owned will be distributed amongst the remaining storage-enabled cache servers. If it is a graceful shutdown, this will be done before the cache server shuts down. If its not graceful, e.g. failure, the data will be recovered by the normal recovery process.
In terms of OOM, that can happen when there are not enough cache servers left to hold the data. e.g. you had 20 cache servers and then only 5 are left. If you don’t have size limitations you can get OOM.
but if you are getting OOM in you storage disabled clients it could be many things and without error messages, config ,etc, difficult to diagnose.
Probably worth posting a question at https://forums.oracle.com/forums/forum.jspa?forumID=480 or if log an SR with Oracle support.
Hope that helps.
Regards
Tim
Hi Tim,
I see you use WKA configurations.
We configured our Coherence cluster multicast over our 2 low latency connected datacenters. In the past Ehcache clusters performed well over multicast over our datacenters.
Do you see advantages using WKA? I guess with 2 datacenters you create different wka’s for each to avoid a single point of failure? Is the wka uptime included in the site-save status?
Thanks
Hi Luc
I just use WKA in my examples so as not to inadvertently join other clusters people are using.
If you are able to use multicast on your network, I would use it if you can as there are some operations that perform better with multicast. E.g. when multicast is enabled and a message needs to be sent to > 25% of the cluster, it will be sent via multicast.
There is a great article from Jon Purdy here that explains the reasons why multicast is preferred option.
Having said that,WKA will work as well, but you need to be aware of the some of the operations mentioned by Jon, especially in large clusters.
By setting WKA, you effectively disabled multicast communications in a cluster altogether.
Regards
Tim
Hi Tim
It would be very thankfully if you could provide the download link for the source code
of the following classes:
RunPartitionTest.java
ClusterBuilderHelper.java
PartitionHelper.java
Regards
Michel Herrera
michelherrerasanchez@gmail.com