You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/04/15 14:05:59 UTC

[GitHub] [lucene] ChrisHegarty opened a new pull request, #812: LUCENE-10517: Improve performance of SortedSetDV faceting by iterating on class types

ChrisHegarty opened a new pull request, #812:
URL: https://github.com/apache/lucene/pull/812

   # Description
   
   SortedSetDV faceting (and friends), can improve performance within tight loops by using invokevirtual (rather than invokeinterface). The C2 JIT compiler can produce slightly more optimal code in this case, and since these loops are very hot, the impact can be significant (in the order of 10-20%).
   
   This issue is in some ways similar, and builds upon, prior optimisations in this area, like say [LUCENE-5300](https://issues.apache.org/jira/browse/LUCENE-5300).
   # Solution
   
   The code change amounts to using `SortedDocValues` or `SortedSetDocValues` class types, rather than the `DocIdSetIterator` interface type, in loops (specifically for invocation of `nextDoc()`, when the iterator type is known and not wrapped.
   
   # Tests
   
   No new tests. Existing tests all pass successfully.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to Contribute](https://github.com/apache/lucene/blob/main/CONTRIBUTING.md) and my code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request title.
   - [x] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `main` branch.
   - [x] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] ChrisHegarty commented on pull request #812: LUCENE-10517: Improve performance of SortedSetDV faceting by iterating on class types

Posted by GitBox <gi...@apache.org>.
ChrisHegarty commented on PR #812:
URL: https://github.com/apache/lucene/pull/812#issuecomment-1105295002

   >What I'm unsure of is whether we have good coverage for all combinations of single-valued/multi-valued and no-deletes/with-deletes. 
   
   Running code coverage shows that there are some gaps across the various combinations. Some of these gaps could be accounted for due to randomness in the test, but it seems desirable to have at least some minimal testing for all combinations regardless of randomness. I'll propose a few follow up PRs to "fill in" these gaps.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #812: LUCENE-10517: Improve performance of SortedSetDV faceting by iterating on class types

Posted by GitBox <gi...@apache.org>.
ChrisHegarty commented on code in PR #812:
URL: https://github.com/apache/lucene/pull/812#discussion_r851301618


##########
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java:
##########
@@ -126,23 +126,41 @@ private void countAll(IndexReader reader) throws IOException {
 
       NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued);
       if (singleValued != null) {
-        for (int doc = singleValued.nextDoc();
-            doc != DocIdSetIterator.NO_MORE_DOCS;
-            doc = singleValued.nextDoc()) {
-          if (liveDocs != null && liveDocs.get(doc) == false) {
-            continue;
+        if (liveDocs == null) {

Review Comment:
   This change just hoists the null check outside of the (potentially hot) loop. In our analysis we don't observe that C2 can consistently automatically do this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] ChrisHegarty commented on pull request #812: LUCENE-10517: Improve performance of SortedSetDV faceting by iterating on class types

Posted by GitBox <gi...@apache.org>.
ChrisHegarty commented on PR #812:
URL: https://github.com/apache/lucene/pull/812#issuecomment-1105295000

   >What I'm unsure of is whether we have good coverage for all combinations of single-valued/multi-valued and no-deletes/with-deletes. 
   
   Running code coverage shows that there are some gaps across the various combinations. Some of these gaps could be accounted for due to randomness in the test, but it seems desirable to have at least some minimal testing for all combinations regardless of randomness. I'll propose a few follow up PRs to "fill in" these gaps.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mikemccand commented on pull request #812: LUCENE-10517: Improve performance of SortedSetDV faceting by iterating on class types

Posted by GitBox <gi...@apache.org>.
mikemccand commented on PR #812:
URL: https://github.com/apache/lucene/pull/812#issuecomment-1106493050

   > Thanks @mikemccand. It's good to see that some got faster. Strange that some got a little slower too though. I'll take a look and try to reproduce. It'll also be interesting to see how these behave over a couple of nightly runs.
   
   Great, thanks @ChrisHegarty -- the nightly box has many cores (see https://blog.mikemccandless.com/2021/01/apache-lucene-performance-on-128-core.html), and runs `wikimediumall`, in case that matters.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] ChrisHegarty commented on pull request #812: LUCENE-10517: Improve performance of SortedSetDV faceting by iterating on class types

Posted by GitBox <gi...@apache.org>.
ChrisHegarty commented on PR #812:
URL: https://github.com/apache/lucene/pull/812#issuecomment-1106484402

   Thanks @mikemccand. It's good to see that some got faster. Strange that some got a little slower too though. I'll take a look and try to reproduce. It'll also be interesting to see how these behave over a couple of nightly runs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mikemccand commented on pull request #812: LUCENE-10517: Improve performance of SortedSetDV faceting by iterating on class types

Posted by GitBox <gi...@apache.org>.
mikemccand commented on PR #812:
URL: https://github.com/apache/lucene/pull/812#issuecomment-1114815361

   Curiously, the [`dayOfYear` flat facets](https://home.apache.org/~mikemccand/lucenebench/BrowseDayOfYearSSDVFacets.html) performance seems to have already improved, though perhaps with higher variance.  Maybe that one datapoint (annotation `EI`) was a noisy aberration?  I think we can leave it be?  The speedup to the low cardinality hierarchical case is dramatic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] ChrisHegarty commented on pull request #812: LUCENE-10517: Improve performance of SortedSetDV faceting by iterating on class types

Posted by GitBox <gi...@apache.org>.
ChrisHegarty commented on PR #812:
URL: https://github.com/apache/lucene/pull/812#issuecomment-1100130595

   I added some luceneutil benchmark output in the JIRA issue, but while positive someone more familiar running these benchmarks should verify in their own environment.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mikemccand commented on pull request #812: LUCENE-10517: Improve performance of SortedSetDV faceting by iterating on class types

Posted by GitBox <gi...@apache.org>.
mikemccand commented on PR #812:
URL: https://github.com/apache/lucene/pull/812#issuecomment-1106435368

   Thanks @ChrisHegarty!  What an awesome improvement.
   
   Some of the [nightly faceting tasks](https://home.apache.org/~mikemccand/lucenebench/) got a nice bump, e.g. [pure browse hierarchical YYYY/MM/DD `lastModiefied` case](https://home.apache.org/~mikemccand/lucenebench/BrowseDateSSDVFacets.html).  But others seem to have gotten slower, e.g. [`dayOfYear` flat facets](https://home.apache.org/~mikemccand/lucenebench/BrowseDayOfYearSSDVFacets.html).
   
   I'll add an annotation to the charts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] ChrisHegarty commented on pull request #812: LUCENE-10517: Improve performance of SortedSetDV faceting by iterating on class types

Posted by GitBox <gi...@apache.org>.
ChrisHegarty commented on PR #812:
URL: https://github.com/apache/lucene/pull/812#issuecomment-1100132616

   The perf improvements come from changing the target type of the `nextDoc` invocations - which results in an invokevirtual rather than an invokeinterface. The changes in this PR proposed to add variants of countXX where the aforementioned is possible, but alternatively branches could be the existing code (rather than adding new methods), which achieves similar perf results.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] ChrisHegarty commented on pull request #812: LUCENE-10517: Improve performance of SortedSetDV faceting by iterating on class types

Posted by GitBox <gi...@apache.org>.
ChrisHegarty commented on PR #812:
URL: https://github.com/apache/lucene/pull/812#issuecomment-1115774306

   @mikemccand I agree - I think we can leave it be. We should keep an eye on it though, to see how the variance plays out (it's likely hitting bad profile pollution on some runs).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz merged pull request #812: LUCENE-10517: Improve performance of SortedSetDV faceting by iterating on class types

Posted by GitBox <gi...@apache.org>.
jpountz merged PR #812:
URL: https://github.com/apache/lucene/pull/812


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org