Identifying Index Candidates For Queries In Your Cluster

Introduction
Coherence is very efficient in running queries in parallel across your cluster. For a “Partitioned” or “Distributed” cache, the data is stored evenly across all storage-enabled members. When you run a query, the request is sent to all nodes that hold data, executed in parallel on those nodes, and the results returned and aggregated.  For example the following statement will get the average balance for all customers in the “South” region:


Double southBalance = (Double) nc.aggregate(new EqualsFilter("getRegion","South"),
                                            new DoubleAverage("getBalance"));

Data is stored serialized in the storage-enabled members, so any non-key based access that require predicates or access to data means the data must be de-serlialized. In the example above the “getRegion” for the predicate as well as the “getBalance” for the DoubleAverage will cause de-serialization to occur on the storage-enabled members and increase query time.

Indexes are useful in this case. We can create and index on a “getter” of an object, such as “getRegion”, which will tell Coherence to store de-serliazed values in preparation for use by a query.  This does however use more heap space and requires more time to create/update the index on insertion, update or deletion of cache entries. Its really a balancing act ensuring you don’t add too many indexes, versus getting query times down.

(Note: By using Portable Objects (see this tutorial for a good explanation and demonstration.), you can limit the indexes you may require by using PofExtractors for querying. This is outside the scope of this post, but see here for more information.)

Understanding the MBeans

So how do we identify queries that are taking a long time in our cluster and may be candidates for indexes? Within the StorageManagerMBean (see here), there are a number of attributes that help us identify this.  In particular the following are of interest:

  1. MaxQueryDescription – A description of the query with the longest duration that exceeds the MaxQueryThresholdMillis attribute since the statistics were last reset.
  2. MaxQueryDurationMillis – The number of milliseconds of the longest running query since the statistics were last reset. ie. For the above Query.
  3. MaxQueryThresholdMillis – A threshold, in milliseconds, for recording queries. The longest query that executes longer than this threshold is reported by MaxQueryDescription attribute. The default value is 30 ms.

There are also NonOptimized* and Optimized* metrics for average, count and total time for queries.

Running the Example

To test this we have a Customer POJO with the following attributes. It also implements Serializable of course and with appropriate setters and getters.


    private int    id;
    private String customerName;
    private String region;
    private double balance;

We start up 2 cache servers (with JMX enabled), and run the following code to populate with some data.


  public static void main(String[] args)
        {
        NamedCache             nc     = CacheFactory.getCache(Customer.CACHENAME);
        Map<Integer, Customer> buffer = new HashMap<Integer, Customer>();
        int                    batch  = 5000;
        int                    total  = 300000;
        Random                 rand   = new Random();
        String[] regions = {"North", "South", "East", "West"};

        System.out.println("Populating cache with " + total + " customers");
        nc.clear();

        for (int i = 1; i <= total; i++)
            {
            Customer c = new Customer(i, "Customer " + i, regions[i % regions.length],
                                      rand.nextDouble() * 10000);

            buffer.put(c.getId(), c);

            if (i % batch == 0)
                {
                nc.putAll(buffer);
                buffer.clear();
                System.out.print(".");
                }
            }

        if (!buffer.isEmpty())
            {
            nc.putAll(buffer);
            buffer.clear();
            }

        System.out.println("\nTotal cache size is " + nc.size());

        }

Output similar to the following is displayed. You can see the 2 storage enable members and the client member in the MemberSet.

….
MasterMemberSet(
  ThisMember=Member(Id=3, Timestamp=2012-11-19 13:07:21.257, Address=127.0.0.1:8092, MachineId=60314, Location=site:,machine:localhost,process:51508, Role=OracleCoherenceDemoPopulateCache)
  OldestMember=Member(Id=1, Timestamp=2012-11-19 13:06:34.522, Address=127.0.0.1:8088, MachineId=60314, Location=site:,machine:localhost,process:51500, Role=CoherenceServer)
  ActualMemberSet=MemberSet(Size=3
    Member(Id=1, Timestamp=2012-11-19 13:06:34.522, Address=127.0.0.1:8088, MachineId=60314, Location=site:,machine:localhost,process:51500, Role=CoherenceServer)
    Member(Id=2, Timestamp=2012-11-19 13:07:08.68, Address=127.0.0.1:8090, MachineId=60314, Location=site:,machine:localhost,process:51506, Role=CoherenceServer)
    Member(Id=3, Timestamp=2012-11-19 13:07:21.257, Address=127.0.0.1:8092, MachineId=60314, Location=site:,machine:localhost,process:51508, Role=OracleCoherenceDemoPopulateCache)
    )
  MemberId|ServiceVersion|ServiceJoined|MemberState
    1|3.7.1|2012-11-19 13:06:34.522|JOINED,
    2|3.7.1|2012-11-19 13:07:08.68|JOINED,
    3|3.7.1|2012-11-19 13:07:21.473|JOINED
  RecycleMillis=1200000
  RecycleSet=MemberSet(Size=0
    )
  )

TcpRing{Connections=[2]}
IpMonitor{AddressListSize=0}

Populating cache with 300000 customers
............................................................
Total cache size is 300000

Identifying the Longest Running Query

Next we startup jconsole and attach to one of the running DefaultCacheServer processes. Go to MBeans tab, and expand the Coherence MBean and drill down so that you see the attributes of one of the storage-enabled members.

View of MBeans from jconsole

Next lets run the following code to execute 2 queries.


 public static void main(String[] args)
        {
        NamedCache nc = CacheFactory.getCache(Customer.CACHENAME);
        long duration = 0L;

        System.out.println("Get total balance... ");
        startTimer();
        Double totalBalance = (Double) nc.aggregate(AlwaysFilter.INSTANCE,
                                                    new DoubleSum("getBalance"));
        duration = stopTimer();

        System.out.println("Balance is " + String.format("%12.1f", totalBalance.floatValue()) +
                           ", time = " + duration + " ms");

        System.out.println("Get average balance for South region... ");
        startTimer();
        Double southBalance = (Double) nc.aggregate(new EqualsFilter("getRegion","South"),
                                                    new DoubleAverage("getBalance"));
        duration = stopTimer();
        System.out.println("Average Balance is " + String.format("%12.1f",southBalance) +
                           ", time = " + duration + " ms");

        }

    private static void startTimer()
        {
    	start = System.currentTimeMillis();
        }

    private static long stopTimer()
        {
    	stop = System.currentTimeMillis();
    	return stop - start;
        }

    private static long start = 0L;
    private static long stop = 0L;

The above queries do the following:

  1. Get the total balance of all Customers.
  2. Get the average balance for all customers in the “South” region.

Once we run this, we get output similar to the following:


Get total balance...
Balance is 1498224640.0, time = 2359 ms
Get average balance for South region...
Average Balance is       4994.2, time = 1249 ms

Now these query times are still pretty good but we can improve them greatly.

When we go back to jconsole and refresh the statistics, we should see something similar to the following:

Non Optimized Query

This shows that a query using the filter EqualsFilter(.getRegion(), South)) has taken 1161ms to run. This is the execution time on this particular storage-enabled member. A similar statistic will be available on other members.

The reason that this query is taking this long is that every Customer object is deserialized  so that the region can be checked and the balance then retrieved if the region is “South”. This is a lot of extra work that the cluster has to do.

Adding Indexes

Now that we have identified the longest running query, lets add some indexes to help us here:


    System.out.println("Adding indexes for getBalance and getRegion...");
    nc.addIndex(new ReflectionExtractor("getBalance"), true, null);
    nc.addIndex(new ReflectionExtractor("getRegion"), true, null);
    System.out.println("Done");

Ensure that you go back to jconsole and reset the statistics by going to the Operations node and clicking on “resetStatistics”. This ensures we can pick up the new values. See below:

Reset Statistics

Once we re-run the queries, we get output similar to the following:


Get total balance...
Balance is 1498224640.0, time = 182 ms
Get average balance for South region...
Average Balance is       4994.2, time = 72 ms

We can see that the query time for the second query has now gone down to 72 ms from 1249 ms. The time for the first query has also reduced to 182 ms from 2359 ms. We can see the dramatic effect of adding the index on the query time.

If we go back to jconsole and refresh the statistics we will so that the first query is now the longest running query. This is ok as we have still reduced the time to get the
total balance from the Customers quite considerably.

Optimised Query

Conclusion
As mentioned above, POF helps us reduce the number of indexes required and you may not need to add indexes, but indexes may still be required depending upon the queries and their execution times. Again its a balancing act between having a sensible number of indexes and getting the query response times your are after.

It is also worth while looking at the “Query Explain Plan” feature, which allows you to understand the path a query took. See this Youtube Video for information on this feature.

All of these JMX statistics can be retrieved programatically, via JMX APIs and via Oracle Enterprise Manager, so this approach can be used on an automated basis to help identify potential query improvements.

Enjoy.

About these ads
This entry was posted in Examples and tagged , , . Bookmark the permalink.

One Response to Identifying Index Candidates For Queries In Your Cluster

  1. Pingback: Oracle WebLogic Server 12c is the industry’s #1 application server « WebLogic Community

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s