You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Chris M. Hostetter (Jira)" <ji...@apache.org> on 2020/03/04 17:57:00 UTC
[jira] [Commented] (SOLR-13132) Improve JSON "terms" facet performance when sorted by relatedness

    [ https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051469#comment-17051469 ] 

Chris M. Hostetter commented on SOLR-13132:
-------------------------------------------

Hey Michael,

I spent a little time looking at the PR – not an in depth review, just poking around, to try to get a an overall sense of the idea...

*TL:DR:* Definitely some bugs, needs tests, i'd like to help but it's a bit overwhelming to try and review as a monolithic PR
----
The first thing that jumped out at me is that there are no tests, or test modifications – that seemes concerning.

With something like this, where we're trying to optimize the behavior through caching, it's tempting to assume that as long as the change doesn't break existing tests that's "good enough" because we're not adding new functionality/APIs – but there's a few problems with that assumption:
 # There is actually a functionality/API change, in the form of the new config/hueristics/params controlling when/if to take advantage of the cache (and in this specific case of this PR: hueristics on when/if to do a single sweep). those options/hueristics need tested to ensure they behave as expected (which can typically be done using white box inspection of cache metrics, using things like the /metrics API)
 # existing tests that were written w/o any idea that caching might be invovled in the underlying code rarely follow usage patterns that would typically trigger cache hits – ie: a test that does the exact same query (or facet) twice in a row isn't likely to be written unless/untill you know there may/should be a cache hit involved.

Because all these changes are esentially a No-Op (w/o an explicit "termFacetCache" configured and/or an explicit query option) then even if existing tests would just happen to trigger the cache (or sweep) logic that won't matter w/o test configuration changes.

I did a quick and dirty hack, demonstrated by this little patch below, to try and *force* the "termFacetCache" to be enabled for all tests (and all term faceting, regardless of how small the count was) and was suprised to see a *LOT* of test errors/failures (I got 63 just in solr core)...
{noformat}
diff --git a/solr/core/src/java/org/apache/solr/request/TermFacetCache.java b/solr/core/src/java/org/apache/solr/request/TermFacetCache.java
index 3f22449b380..b818d2723d5 100644
--- a/solr/core/src/java/org/apache/solr/request/TermFacetCache.java
+++ b/solr/core/src/java/org/apache/solr/request/TermFacetCache.java
@@ -28,7 +28,7 @@ import org.apache.solr.search.QueryResultKey;
 public class TermFacetCache {
 
   public static final String NAME = "termFacetCache";
-  public static final int DEFAULT_THRESHOLD = 5000;
+  public static final int DEFAULT_THRESHOLD = 1; // nocommit: HACK to force all tests to use cache // 5000;
 
 
   public static final class FacetCacheKey {
diff --git a/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java b/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.jav
index ae47aed273b..8767a8f4db2 100644
--- a/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java
+++ b/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java
@@ -282,7 +282,8 @@ public class SolrIndexSearcher extends IndexSearcher implements Closeable, SolrI
       if (documentCache != null) clist.add(documentCache);
 
       if (solrConfig.userCacheConfigs.isEmpty()) {
-        cacheMap = NO_GENERIC_CACHES;
+        // nocommit: HACK to force TermFacetCache in all testing...
+        cacheMap = new HashMap<>(); // nocommit: cacheMap = NO_GENERIC_CACHES;
       } else {
         cacheMap = new HashMap<>(solrConfig.userCacheConfigs.size());
         for (Map.Entry<String,CacheConfig> e : solrConfig.userCacheConfigs.entrySet()) {
@@ -294,6 +295,22 @@ public class SolrIndexSearcher extends IndexSearcher implements Closeable, SolrI
         }
       }
 
+      { // nocommit: HACK to force TermFacetCache in all testing...
+                final Map<String,String> nocommitArgs = new HashMap<>();
+        nocommitArgs.put("name",org.apache.solr.request.TermFacetCache.NAME);
+        nocommitArgs.put("size","200");
+        nocommitArgs.put("initialSize","200");
+        nocommitArgs.put("autowarmCount","200");
+        
+        final SolrCache nocommitCache =
+          (new CacheConfig(CaffeineCache.class,
+                           nocommitArgs,
+                           new org.apache.solr.request.TermFacetCacheRegenerator())).newInstance();
+        
+        cacheMap.put(nocommitCache.name(), nocommitCache);
+        clist.add(nocommitCache);
+      }
+
       cacheList = clist.toArray(new SolrCache[clist.size()]);
     } else {
       this.filterCache = null;
{noformat}
I didn't analyze the results in depth, but at a glamce the failures seem to fall into a few categories:
 * term facet constraints out of order and/or unexpected counts
 ** suggesting that the cache is returning incorrect results or results for the wrong key?
 * ArrayIndexOutOfBoundsException in various faceting classes
 * NullPointerException from FacetCacheKey.hashCode()
 * "UnsupportedOperationException: reset not supported for cached count" in the new CacheUpdateCountSlotAcc

While that patch above is a _HUGE_ hack and obviously not committable, applying it should make it easy to spot some problems and help inspire some fixes / new tests.
----
While the above tests were running, before I realized how many failures there were, I also tried building solr server and doing some manual testing of relatedness aggreations (using the example docs/request from the ref-guide: [https://lucene.apache.org/solr/guide/8_4/json-facet-api.html#semantic-knowledge-graph-example]) and inspection of the cache metrics.

The first thing that jumped out at me was that while I kept seeing cache _lookups_ I wasn't seeing any cache _inserts_ (this hitratio was always 0)

playing around with the requests/options lead me to realize that the problem seemed to be that the fields I was faceting on didn't have docValues, leading me to deduce the problems is aparently FacetFieldProcessorByArrayUIF ... FacetFieldProcessor has modifications to support the use of the cache in subclasses and allow lookups by the collectors, and FacetFieldProcessorByArrayDV has code to insert collected results into the cache, but FacetFieldProcessorByArrayUIF has no modifications at all (so no reason if/why faceting using that processor would result in cache insertions)

Which just goes to show the importance of new (whitebox) tests asserting cache hits/miss metrics when expected. :)
----
From a pure code spelunking standpoint, poking around here and there, I was suprised to see {{QueryResultKey}} modified by this PR, since I couldn't think of any reason why this new facet caching should involve/affect the {{queryResultsCache}}.

As best I can tell, this seems to just be an attempt at code reuse? .. {{TermFacetCache}} needs a key that involves an (unordered) list of "filters" so the PR modifies {{QueryResultKey}} to try and compose it in inside {{FacetCacheKey}}.

But in the process these changes to {{QueryResultKey}} break it's primary usage by treating the main "q" param as just another "filter". For it's _intended_ purpose (the {{queryResultsCache}} ) the resulting sort order of the documents matters, so {{q=foo&fq=bar}} must *NOT* result in a cache hit for {{q=bar&fq=foo}} – but with the changes in the current PR the following tests this is no longer true, causing a test like this one (which would currently pass on master) to fail...
{noformat}
diff --git a/solr/core/src/test/org/apache/solr/core/QueryResultKeyTest.java b/solr/core/src/test/org/apache/solr/core/QueryResultKeyTest.java
index dc1e48b17b3..9d627146595 100644
--- a/solr/core/src/test/org/apache/solr/core/QueryResultKeyTest.java
+++ b/solr/core/src/test/org/apache/solr/core/QueryResultKeyTest.java
@@ -106,6 +106,17 @@ public class QueryResultKeyTest extends SolrTestCaseJ4 {
 
     assertKeyNotEquals(key1, key2);
   }
+  
+  public void testSwappingQueryAndFilterIsNotEquals() {
+    final Query qx = new FlatHashTermQuery("x");
+    final Query qy = new FlatHashTermQuery("y");
+    
+    final QueryResultKey x = new QueryResultKey(qx, Arrays.asList(qy), null, 0);
+    final QueryResultKey y = new QueryResultKey(qy, Arrays.asList(qx), null, 0);
+    
+    assertKeyNotEquals(x, y);
+  }
+    
 
   public void testRandomQueryKeyEquality() {

{noformat}
I would strongly suggest reverting your changes to {{QueryResultKey}}, and instead refactor the {{unorderedCompare}} logic into a helper class where it can be re-used by {{TermFacetCache}} ( wrapping a simple {{List<Query>}} instead of a hacked {{QueryResultKey}} )
----
In general, to me at least, the changes in this PR feel a bit overwhelming and I'm having a hard time wrapping my head around them all at once.

I fully understand how tackling the "do relatedness() calculatios in a single sweep" implementation naturally lead you to "implement a termFacetCache" at the same time to gain more speed ups, and that you wound up doing that work together and it all getting commingled, but from the standpoint of encouraging/facilitating review, feedback, adoption (from me or from others) I would suggest you to split these out into distinct PRs (even if one is a dependent fork/branch of the other) so that it's easier to for reviewers to mentally digest. As things stand right now I find myself getting confused as to when/why various changes in various classes exist: to support caching? to support sweeping? to support caching while sweeping?

> Improve JSON "terms" facet performance when sorted by relatedness 
> ------------------------------------------------------------------
>
>                 Key: SOLR-13132
>                 URL: https://issues.apache.org/jira/browse/SOLR-13132
>             Project: Solr
>          Issue Type: Improvement
>          Components: Facet Module
>    Affects Versions: 7.4, master (9.0)
>            Reporter: Michael Gibney
>            Priority: Major
>         Attachments: SOLR-13132-with-cache-01.patch, SOLR-13132-with-cache.patch, SOLR-13132.patch
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate {{relatedness}} for every term. 
> The current implementation uses a standard uninverted approach (either {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain base docSet, and then uses that initial pass as a pre-filter for a second-pass, inverted approach of fetching docSets for each relevant term (i.e., {{count > minCount}}?) and calculating intersection size of those sets with the domain base docSet.
> Over high-cardinality fields, the overhead of per-term docSet creation and set intersection operations increases request latency to the point where relatedness sort may not be usable in practice (for my use case, even after applying the patch for SOLR-13108, for a field with ~220k unique terms per core, QTime for high-cardinality domain docSets were, e.g.: cardinality 1816684=9000ms, cardinality 5032902=18000ms).
> The attached patch brings the above example QTimes down to a manageable ~300ms and ~250ms respectively. The approach calculates uninverted facet counts over domain base, foreground, and background docSets in parallel in a single pass. This allows us to take advantage of the efficiencies built into the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids the per-term docSet creation and set intersection overhead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org