One of my colleagues, Patrick Peralta, has published an excellent article on near cache best practices. It has a great explanations of the feature as well as recommendations on configuration based upon your use case. Please see https://blogs.oracle.com/OracleCoherence/entry/oracle_coherence_near_cache_best.
Coherence Incubator 11 Released
It's here! Oracle Coherence Incubator Release 11 has now been finalized and made public. Everyone can now pick up the official Maven builds from maven.java.net or via Maven Central.
Instead of repeating all of the goodies and release details again, I'll simply refer you to take a look at my recent post. However to save you some time, here are the important…
The Coherence Incubator is on the Move!
Happy new (belated) year to everyone!
I thought I would start of this year with a link to Brian Oliver’s post on the Coherence Incubator moving to open-source. I know there has been a lot of hard work to get the Incubator moved to java.net as well as all the maven integration Brian has done to make it easy to incorporate into projects.
See http://brianoliver.wordpress.com/2013/02/03/coherence-incubator-11-coming-to-an-open-source-repository-near-you-soon for his announcement.
Great work Brian!
Avoiding Unintended Locking When Retrieving Related Objects in Entry Processors
Introduction
In the post I wrote on Partition Level Transactions, I showed how we could use data affinity (see here) to update multiple caches in a single atomic operation. This is a very powerful feature and I thought I would just add a bit more to the conversation around this.
When you implement key association, you get “related” entries always stored in the same partitions. E.g. you may have key association between a Customer and Orders on the customerId attribute. In this case you will always get the Orders for customer 1 in the same partition as the customer object “Customer 1″.
As described in the post above you can use “Partition Level Transactions” to update related cache objects in this way. But the other more obvious advantage is that if you are running an Entry Processor (EP) on Order 200 for Customer 1, you know that Customer 1 is in the same partition and you can get directly access the related object (Customer) via the BackingMapManagerContext.getBackingMapEntry() method.
This means direct memory access to the Objects rather than having to get them from another cluster member. Very fast!
The Potential Unintended Side Effect
On first glance the following code (from the previous blogs EP) makes sense if you just wanted to access the Customer object while running an EP against an Order entry:
@Override
public Object process(Entry entry)
{
...
BackingMapManagerContext ctx = ((BinaryEntry) entry).getContext();
// Get the Customer but we dont want to update. This will work, but the side effect
// is the Customer key will be added to "sandbox" of changes and locked.
BinaryEntry customerEntry =
(BinaryEntry) orderEntry.getContext().getBackingMapContext(Customer.CACHENAME)
.getBackingMapEntry(ctx.getKeyToInternalConverter()
.convert(newOrder.getCustomerId()));
Customer thisCustomer = ((Customer) customerEntry.getValue());
...
This will certainly work, but remember that calling getBackingMapEntry() adds the retrieved BinaryEntry to the “sandbox” of changes, which includes also applying the EP locking semantics to that key. So even if you wanted to just read the value, you are locking it for the duration of the EP as well.
This would then queue any other updates via EP’s to the Customer which is not what we intend in this case.
How to Avoid Locking When Getting Related Objects
This is where you need to subtly change the code to access the BackingMap directly (which only works for caches related via key association) to get the required Customer object. The following code shows how to get an Object, which is related object using getBackingMap().get() instead of getBackingMapEntry() in the case where you don’t want to update the related object:
@Override
public Object process(Entry entry)
{
...
BackingMapManagerContext ctx = ((BinaryEntry) entry).getContext();
// Get the Customer but we dont want to update. This will work, and no side effect of
// of locking.
Binary binCustomer =
(Binary)ctx.getBackingMapContext(Customer.CACHENAME)
.getBackingMap().get(ctx.getKeyToInternalConverter()
.convert(newOrder.getCustomerId()));
// Because we are getting a Binary instead of BinaryEntry, we need to deserialize with the
// serlializer for the Service.
Customer c = (Customer) ExternalizableHelper
.fromBinary(binCustomer, ctx.getCacheService().getSerializer());
...
Conclusion
So key things to remember are:
- Use getBackingMap().get() if you just want to retrieve related cache objects you know are in the same partition and don’t want to update them.
- Use getBackingMapEntry() if you want to update a related entry using Partition Level Transactions.
Enjoy you holiday break and see you next year.
Oracle Coherence 3.7 New Features Series
After the busy week with the Melbourne Special Interest Group (SIG), which was a great success and customer visits, plus lots of other work, I’ve finally got around to a blog post. (BTW, I think Sydney would be a great location for our next SIG sometime in the new year!)
There have been many exciting new features in Coherence 3.7, and the “Oracle Learning Library” has a number of hands-on and eCourses that go through some of these features.
Some of the topics covered include:
- Coherence 3.7 New Features Overview.
- Coherence Query Language, Trace and Explain-plan – Overview and hands-on.
- Elastic Data – Overview and hands on.
- Portable Object Format Annotations – hands-on.
- Coherence REST – Overview and hands-on.
- Partition Level Transactions – Overview.
- Coherence*Extend Load Balancing – Overview.
All these can be found at the “Oracle Learning Library” and are worth a look for both experienced Coherence developers, or newbies.
Identifying Index Candidates For Queries In Your Cluster
Introduction
Coherence is very efficient in running queries in parallel across your cluster. For a “Partitioned” or “Distributed” cache, the data is stored evenly across all storage-enabled members. When you run a query, the request is sent to all nodes that hold data, executed in parallel on those nodes, and the results returned and aggregated. For example the following statement will get the average balance for all customers in the “South” region:
Double southBalance = (Double) nc.aggregate(new EqualsFilter("getRegion","South"),
new DoubleAverage("getBalance"));
Data is stored serialized in the storage-enabled members, so any non-key based access that require predicates or access to data means the data must be de-serlialized. In the example above the “getRegion” for the predicate as well as the “getBalance” for the DoubleAverage will cause de-serialization to occur on the storage-enabled members and increase query time.
Indexes are useful in this case. We can create and index on a “getter” of an object, such as “getRegion”, which will tell Coherence to store de-serliazed values in preparation for use by a query. This does however use more heap space and requires more time to create/update the index on insertion, update or deletion of cache entries. Its really a balancing act ensuring you don’t add too many indexes, versus getting query times down.
(Note: By using Portable Objects (see this tutorial for a good explanation and demonstration.), you can limit the indexes you may require by using PofExtractors for querying. This is outside the scope of this post, but see here for more information.)
Understanding the MBeans
So how do we identify queries that are taking a long time in our cluster and may be candidates for indexes? Within the StorageManagerMBean (see here), there are a number of attributes that help us identify this. In particular the following are of interest:
- MaxQueryDescription - A description of the query with the longest duration that exceeds the MaxQueryThresholdMillis attribute since the statistics were last reset.
- MaxQueryDurationMillis - The number of milliseconds of the longest running query since the statistics were last reset. ie. For the above Query.
- MaxQueryThresholdMillis - A threshold, in milliseconds, for recording queries. The longest query that executes longer than this threshold is reported by MaxQueryDescription attribute. The default value is 30 ms.
There are also NonOptimized* and Optimized* metrics for average, count and total time for queries.
Running the Example
To test this we have a Customer POJO with the following attributes. It also implements Serializable of course and with appropriate setters and getters.
private int id;
private String customerName;
private String region;
private double balance;
We start up 2 cache servers (with JMX enabled), and run the following code to populate with some data.
public static void main(String[] args)
{
NamedCache nc = CacheFactory.getCache(Customer.CACHENAME);
Map<Integer, Customer> buffer = new HashMap<Integer, Customer>();
int batch = 5000;
int total = 300000;
Random rand = new Random();
String[] regions = {"North", "South", "East", "West"};
System.out.println("Populating cache with " + total + " customers");
nc.clear();
for (int i = 1; i <= total; i++)
{
Customer c = new Customer(i, "Customer " + i, regions[i % regions.length],
rand.nextDouble() * 10000);
buffer.put(c.getId(), c);
if (i % batch == 0)
{
nc.putAll(buffer);
buffer.clear();
System.out.print(".");
}
}
if (!buffer.isEmpty())
{
nc.putAll(buffer);
buffer.clear();
}
System.out.println("\nTotal cache size is " + nc.size());
}
Output similar to the following is displayed. You can see the 2 storage enable members and the client member in the MemberSet.
….
MasterMemberSet(
ThisMember=Member(Id=3, Timestamp=2012-11-19 13:07:21.257, Address=127.0.0.1:8092, MachineId=60314, Location=site:,machine:localhost,process:51508, Role=OracleCoherenceDemoPopulateCache)
OldestMember=Member(Id=1, Timestamp=2012-11-19 13:06:34.522, Address=127.0.0.1:8088, MachineId=60314, Location=site:,machine:localhost,process:51500, Role=CoherenceServer)
ActualMemberSet=MemberSet(Size=3
Member(Id=1, Timestamp=2012-11-19 13:06:34.522, Address=127.0.0.1:8088, MachineId=60314, Location=site:,machine:localhost,process:51500, Role=CoherenceServer)
Member(Id=2, Timestamp=2012-11-19 13:07:08.68, Address=127.0.0.1:8090, MachineId=60314, Location=site:,machine:localhost,process:51506, Role=CoherenceServer)
Member(Id=3, Timestamp=2012-11-19 13:07:21.257, Address=127.0.0.1:8092, MachineId=60314, Location=site:,machine:localhost,process:51508, Role=OracleCoherenceDemoPopulateCache)
)
MemberId|ServiceVersion|ServiceJoined|MemberState
1|3.7.1|2012-11-19 13:06:34.522|JOINED,
2|3.7.1|2012-11-19 13:07:08.68|JOINED,
3|3.7.1|2012-11-19 13:07:21.473|JOINED
RecycleMillis=1200000
RecycleSet=MemberSet(Size=0
)
)
TcpRing{Connections=[2]}
IpMonitor{AddressListSize=0}
Populating cache with 300000 customers
............................................................
Total cache size is 300000
Identifying the Longest Running Query
Next we startup jconsole and attach to one of the running DefaultCacheServer processes. Go to MBeans tab, and expand the Coherence MBean and drill down so that you see the attributes of one of the storage-enabled members.
Next lets run the following code to execute 2 queries.
public static void main(String[] args)
{
NamedCache nc = CacheFactory.getCache(Customer.CACHENAME);
long duration = 0L;
System.out.println("Get total balance... ");
startTimer();
Double totalBalance = (Double) nc.aggregate(AlwaysFilter.INSTANCE,
new DoubleSum("getBalance"));
duration = stopTimer();
System.out.println("Balance is " + String.format("%12.1f", totalBalance.floatValue()) +
", time = " + duration + " ms");
System.out.println("Get average balance for South region... ");
startTimer();
Double southBalance = (Double) nc.aggregate(new EqualsFilter("getRegion","South"),
new DoubleAverage("getBalance"));
duration = stopTimer();
System.out.println("Average Balance is " + String.format("%12.1f",southBalance) +
", time = " + duration + " ms");
}
private static void startTimer()
{
start = System.currentTimeMillis();
}
private static long stopTimer()
{
stop = System.currentTimeMillis();
return stop - start;
}
private static long start = 0L;
private static long stop = 0L;
The above queries do the following:
- Get the total balance of all Customers.
- Get the average balance for all customers in the “South” region.
Once we run this, we get output similar to the following:
Get total balance... Balance is 1498224640.0, time = 2359 ms Get average balance for South region... Average Balance is 4994.2, time = 1249 ms
Now these query times are still pretty good but we can improve them greatly.
When we go back to jconsole and refresh the statistics, we should see something similar to the following:
This shows that a query using the filter EqualsFilter(.getRegion(), South)) has taken 1161ms to run. This is the execution time on this particular storage-enabled member. A similar statistic will be available on other members.
The reason that this query is taking this long is that every Customer object is deserialized so that the region can be checked and the balance then retrieved if the region is “South”. This is a lot of extra work that the cluster has to do.
Adding Indexes
Now that we have identified the longest running query, lets add some indexes to help us here:
System.out.println("Adding indexes for getBalance and getRegion...");
nc.addIndex(new ReflectionExtractor("getBalance"), true, null);
nc.addIndex(new ReflectionExtractor("getRegion"), true, null);
System.out.println("Done");
Ensure that you go back to jconsole and reset the statistics by going to the Operations node and clicking on “resetStatistics”. This ensures we can pick up the new values. See below:
Once we re-run the queries, we get output similar to the following:
Get total balance... Balance is 1498224640.0, time = 182 ms Get average balance for South region... Average Balance is 4994.2, time = 72 ms
We can see that the query time for the second query has now gone down to 72 ms from 1249 ms. The time for the first query has also reduced to 182 ms from 2359 ms. We can see the dramatic effect of adding the index on the query time.
If we go back to jconsole and refresh the statistics we will so that the first query is now the longest running query. This is ok as we have still reduced the time to get the
total balance from the Customers quite considerably.
Conclusion
As mentioned above, POF helps us reduce the number of indexes required and you may not need to add indexes, but indexes may still be required depending upon the queries and their execution times. Again its a balancing act between having a sensible number of indexes and getting the query response times your are after.
It is also worth while looking at the “Query Explain Plan” feature, which allows you to understand the path a query took. See this Youtube Video for information on this feature.
All of these JMX statistics can be retrieved programatically, via JMX APIs and via Oracle Enterprise Manager, so this approach can be used on an automated basis to help identify potential query improvements.
Enjoy.
Coherence Special Interest Group (SIG) – Melbourne, 18th October
We are fortunate to be able to host a ‘Traveling Coherence SIG’ in Melbourne on Thursday 18th October. If you are not familiar with the Coherence SIG’s, see http://coherence.oracle.com/display/CSIG/Home for more details.
We have a number of guest speakers including Brian Oliver – Senior Principal Solutions Architect (Oracle), Craig Blitz – Senior Group Product Manager (Oracle), Pankaj Kambo – Senior Solution Architect (University of Melbourne), and yours truly.
These events run regularly in the US and AMEA, but this is the first time we have been able to lure the experts down-under. It will be a great day with some interesting and informative sessions from both oracle and Customers. It’s also a great opportunity to talk with the developers and architects of the products as well as other Customers using Coherence.
For the details on this particular SIG please see the following URL: http://coherence.oracle.com/display/CSIG/18+October+2012+-+Melbourne,+Australia



