You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Scerri, Antony (ELS)" <A....@elsevier.com> on 2016/05/17 15:33:52 UTC

Issues (with patches) using grouped result and exact stats

Hi

As part of a project using grouped results, we looked at using a sharded index. The first thing the team noted was some of our application tests failing which was due to the term distribution across the shards. However switching on ExactStatsCache didn't help. This was because the grouped results feature uses separate code paths for large parts of its functionality exact stats wasn't enabled. Attempting to resolve this uncovered a couple of other issues with debug explain plans not using exact stats either which results in the information being misleading, and all this it turned out was based on a minor problem with the exact stats not correctly distributing term frequencies in all cases (highly dependant upon your document distribution of course).

So I have registered three bugs (listed below) for these issues, in reverse order to the descriptions above as I went back through tackling the primary cause first and creating patches with test cases for each. Note I did these against the 5.x branch because whilst attempting to apply to the master I couldn't get the test case behaviours to work. After going back to 5.x to where I had originally worked through the fixes I finally determined the use of caching in the test case environment was the problem. I believe applying the changes to master based on where I was at a few months ago should be fairly straightforward, sadly I haven't had time to revisit this. Also because of the nature of the relationship between the issues the patches linked to the Jira issues are dependant upon the preceding issues patch (hopefully this isn't too much of an issue).

SOLR-9122 - ExactStatsCache doesn't share all stats
SOLR-1923 - Explain plans not using ExactStatsCache in debug mode
SOLR-1924 - Grouped Results does not support ExactStatsCache

It is worth noting that this will of course have subtle changes in behaviour, and potentially some performance overhead in some cases depending on how the features have been used.

Hopefully these changes will be accepted as-is but should any queries arise I'll attempt to answer as necessary.

Tony

Antony Scerri
Lead Architect, Elsevier


________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.