You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@lucene.apache.org by "gsmiller (via GitHub)" <gi...@apache.org> on 2023/02/11 14:44:33 UTC

[GitHub] [lucene] gsmiller commented on pull request #12141: Avoid duplicate sorting in KeywordField#newSetQuery (alternative approach)

gsmiller commented on PR #12141:
URL: https://github.com/apache/lucene/pull/12141#issuecomment-1426788802

   Thanks @uschindler for the alternate approach. It helped me understand your earlier suggestion to use streams, which I wasn't totally clear on (I thought you were originally suggesting to do away with prefix-encoding altogether and reference the streams directly inside the query implementations to iterate the terms, which was confusing).
   
   I'm not setup to re-run our internal benchmarks at the moment (where we see a large amount of time spent sorting terms), but I at least ran my simple "benchmark" test case that does some simple timing over the query initialization (see below). The results for this PR were as good as my initial proposal to share prefix-encoded terms. So, from a pure performance point-of-view, this appears to be just as efficient as what I'd come up with initially.
   
   Simple test case "benchmark":
   ```
     public void testSortPerformance() {
       int len = 50000;
       BytesRef[] terms = new BytesRef[len];
       for (int i = 0; i < len; i++) {
         String s = TestUtil.randomSimpleString(random(), 10, 20);
         terms[i] = new BytesRef(s);
       }
   
       int iters = 300;
       for (int i = 0; i < iters; i++) {
         KeywordField.newSetQuery("foo", terms);
       }
   
       long minTime = Long.MAX_VALUE;
       for (int i = 0; i < iters; i++) {
         long t0 = System.nanoTime();
         KeywordField.newSetQuery("foo", terms);
         minTime = Math.min(minTime, System.nanoTime() - t0);
       }
   
       System.err.println("Time: " + minTime / 1_000_000);
     }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org