You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "vsop-479 (via GitHub)" <gi...@apache.org> on 2023/11/27 06:56:23 UTC

[PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

vsop-479 opened a new pull request, #12846:
URL: https://github.com/apache/lucene/pull/12846

   We could copy collected acc into buffered acc, rather than merge them, in bufferSkip phase.
   Because the target CompetitiveImpactAccumulator is cleared in last writeSkipData phase. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on code in PR #12846:
URL: https://github.com/apache/lucene/pull/12846#discussion_r1444694744


##########
lucene/core/src/java/org/apache/lucene/codecs/CompetitiveImpactAccumulator.java:
##########
@@ -93,6 +93,21 @@ public void addAll(CompetitiveImpactAccumulator acc) {
     assert assertConsistent();
   }
 
+  /** Copy {@code acc} into this empty acc. */
+  public void copy(CompetitiveImpactAccumulator acc) {
+    int[] maxFreqs = this.maxFreqs;
+    int[] otherMaxFreqs = acc.maxFreqs;
+    assert Arrays.stream(maxFreqs).sum() == 0;
+
+    System.arraycopy(otherMaxFreqs, 0, maxFreqs, 0, maxFreqs.length);
+
+    for (Impact entry : acc.otherFreqNormPairs) {
+      add(entry, otherFreqNormPairs);
+    }

Review Comment:
   Let's remove the requirement that the accumulator is empty, it doesn't seem necessary? We can then simply implement this copy logic like that?
   
   ```
   System.arraycopy(otherMaxFreqs, 0, maxFreqs, 0, maxFreqs.length);
   otherFreqNormPairs.clear();
   otherFreqNormPairs.addAll(other.otherFreqNormPairs);
   ```
   
   And we can remove the `empty` flag on this class?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on code in PR #12846:
URL: https://github.com/apache/lucene/pull/12846#discussion_r1444698183


##########
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99SkipWriter.java:
##########
@@ -202,7 +202,11 @@ protected void writeSkipData(int level, DataOutput skipBuffer) throws IOExceptio
     CompetitiveImpactAccumulator competitiveFreqNorms = curCompetitiveFreqNorms[level];
     assert competitiveFreqNorms.getCompetitiveFreqNormPairs().size() > 0;
     if (level + 1 < numberOfSkipLevels) {
-      curCompetitiveFreqNorms[level + 1].addAll(competitiveFreqNorms);
+      if (curCompetitiveFreqNorms[level + 1].isEmpty()) {
+        curCompetitiveFreqNorms[level + 1].copy(competitiveFreqNorms);
+      } else {
+        curCompetitiveFreqNorms[level + 1].addAll(competitiveFreqNorms);
+      }

Review Comment:
   This optimization doesn't look like it would kick in often, does it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on code in PR #12846:
URL: https://github.com/apache/lucene/pull/12846#discussion_r1447075161


##########
lucene/core/src/java/org/apache/lucene/codecs/CompetitiveImpactAccumulator.java:
##########
@@ -93,6 +93,21 @@ public void addAll(CompetitiveImpactAccumulator acc) {
     assert assertConsistent();
   }
 
+  /** Copy {@code acc} into this empty acc. */
+  public void copy(CompetitiveImpactAccumulator acc) {
+    int[] maxFreqs = this.maxFreqs;
+    int[] otherMaxFreqs = acc.maxFreqs;
+    assert Arrays.stream(maxFreqs).sum() == 0;
+
+    System.arraycopy(otherMaxFreqs, 0, maxFreqs, 0, maxFreqs.length);
+
+    for (Impact entry : acc.otherFreqNormPairs) {
+      add(entry, otherFreqNormPairs);
+    }

Review Comment:
   > And we can remove the empty flag on this class?
   
   The empty flag is used to indicate wether non-zero level acc is empty, but the optimization for non-zero level acc doesn't  kick in often, so I removed it.
   
   I think ``otherFreqNormPairs.clear()`` is unnecessary, because ``curCompetitiveFreqNorms[0]`` is cleared in ``writeSkipData``.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1833216581

   @jpountz I applied copy for all level's empty acc, Please take a look when you get a chance.
   As so far I just performed copy on empty acc that cleared by ``clear``. Do you mean we could skip the ``clear`` (mainly ``Arrays.fill(maxFreqs, 0)``) after ``writeSkipData``, and override the stale acc in next writing directly?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on code in PR #12846:
URL: https://github.com/apache/lucene/pull/12846#discussion_r1448225023


##########
lucene/core/src/java/org/apache/lucene/codecs/CompetitiveImpactAccumulator.java:
##########
@@ -93,6 +93,20 @@ public void addAll(CompetitiveImpactAccumulator acc) {
     assert assertConsistent();
   }
 
+  /** Copy {@code acc} into this empty acc. */
+  public void copy(CompetitiveImpactAccumulator acc) {
+    assert Arrays.stream(maxFreqs).sum() == 0;
+    assert otherFreqNormPairs.isEmpty();

Review Comment:
   > why should the object currently be empty?
   
   I think the caller should ensure current acc is empty when calling ``copy()``. If current acc is not empty, the caller  should call ``CompetitiveImpactAccumulator.addAll()`` to merge them.
   
   But you are right, it is the caller's responsibility to decide to call which low level implementation.  I will fix it!
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on code in PR #12846:
URL: https://github.com/apache/lucene/pull/12846#discussion_r1444694744


##########
lucene/core/src/java/org/apache/lucene/codecs/CompetitiveImpactAccumulator.java:
##########
@@ -93,6 +93,21 @@ public void addAll(CompetitiveImpactAccumulator acc) {
     assert assertConsistent();
   }
 
+  /** Copy {@code acc} into this empty acc. */
+  public void copy(CompetitiveImpactAccumulator acc) {
+    int[] maxFreqs = this.maxFreqs;
+    int[] otherMaxFreqs = acc.maxFreqs;
+    assert Arrays.stream(maxFreqs).sum() == 0;
+
+    System.arraycopy(otherMaxFreqs, 0, maxFreqs, 0, maxFreqs.length);
+
+    for (Impact entry : acc.otherFreqNormPairs) {
+      add(entry, otherFreqNormPairs);
+    }

Review Comment:
   Let's remove the requirement that the accumulator is empty, it doesn't seem necessary? We can then simply implement this copy logic like that?
   
   ```
   System.arraycopy(otherMaxFreqs, 0, maxFreqs, 0, maxFreqs.length);
   otherFreqNormPairs.clear();
   otherFreqNormPairs.addAll(other.otherFreqNormPairs);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1855623955

   @jpountz 
   I moved the copy/addAll selection to the caller, and reverted treeSet's addAll to iterate add, since there is no speedup with `TreeSet.addAll` in JMH.
   
   Please take a look when you get a chance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1884449395

   @jpountz 
   I removed the optimization for non-zero level acc, and the empty flag. 
   Please take a look when you get a chance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1827252377

   @jpountz Please take a look when you get a chance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on code in PR #12846:
URL: https://github.com/apache/lucene/pull/12846#discussion_r1405810450


##########
lucene/core/src/java/org/apache/lucene/codecs/CompetitiveImpactAccumulator.java:
##########
@@ -93,6 +93,21 @@ public void addAll(CompetitiveImpactAccumulator acc) {
     assert assertConsistent();
   }
 
+  /** Copy {@code acc} into this empty acc. */
+  public void copy(CompetitiveImpactAccumulator acc) {
+    int[] maxFreqs = this.maxFreqs;
+    int[] otherMaxFreqs = acc.maxFreqs;
+    assert Arrays.stream(maxFreqs).sum() == 0;
+
+    System.arraycopy(otherMaxFreqs, 0, maxFreqs, 0, maxFreqs.length);
+
+    for (Impact entry : acc.otherFreqNormPairs) {
+      add(entry, otherFreqNormPairs);
+    }

Review Comment:
   It looks like we can optimize this bit as well by copying entries from the treeset directly? And then maybe we can drop the requirement that the current accumulator is empty?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on code in PR #12846:
URL: https://github.com/apache/lucene/pull/12846#discussion_r1445677720


##########
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99SkipWriter.java:
##########
@@ -202,7 +202,11 @@ protected void writeSkipData(int level, DataOutput skipBuffer) throws IOExceptio
     CompetitiveImpactAccumulator competitiveFreqNorms = curCompetitiveFreqNorms[level];
     assert competitiveFreqNorms.getCompetitiveFreqNormPairs().size() > 0;
     if (level + 1 < numberOfSkipLevels) {
-      curCompetitiveFreqNorms[level + 1].addAll(competitiveFreqNorms);
+      if (curCompetitiveFreqNorms[level + 1].isEmpty()) {
+        curCompetitiveFreqNorms[level + 1].copy(competitiveFreqNorms);
+      } else {
+        curCompetitiveFreqNorms[level + 1].addAll(competitiveFreqNorms);
+      }

Review Comment:
   Yes, This optimization only affects non-zero level acc.
   I measured the called count with TestTermScore.testRandomTopDocs (numDocs = TEST_NIGHTLY):
   Method | count 
   -- | -- 
   copy zero level (bufferSkip) | 1270
   copy non zero (writeSkipData) | 214
   addAll (writeSkipData) | 1196 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on code in PR #12846:
URL: https://github.com/apache/lucene/pull/12846#discussion_r1447359065


##########
lucene/core/src/java/org/apache/lucene/codecs/CompetitiveImpactAccumulator.java:
##########
@@ -93,6 +93,20 @@ public void addAll(CompetitiveImpactAccumulator acc) {
     assert assertConsistent();
   }
 
+  /** Copy {@code acc} into this empty acc. */
+  public void copy(CompetitiveImpactAccumulator acc) {
+    assert Arrays.stream(maxFreqs).sum() == 0;
+    assert otherFreqNormPairs.isEmpty();

Review Comment:
   Sorry I have not been clear in previous comments, but the thing that bugs me a bit is this assertion that the accumulator is currently empty that gives a contract to this method that is surprising to me: why should the object currently be empty? When I look at other `copy()` methods we have, like `BytesRefBuilder#copy`, they don't expect the current object to be empty, it is expected to behave like a `clear()` followed by an `append()` (or `addAll()` for this accumulator). And it looks like we can easily have a similar contract here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1829969643

   Sorry I wasn't clear, I meant to replace entries of the treeset with entries of the other treeset by clearing it first, and then doing an `addAll`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1842384044

   > is this new approach helping cover more cases maybe?
    
   Yes, Previous patch only apply copy for level 0 's acc in ``Lucene99SkipWriter.bufferSkip``,  Current patch apply copy for all level's acc in ``Lucene99SkipWriter.writeSkipData`` by adding a flag to indicates wether this is an empty acc.
   
   > when knowing when the accumulator is empty is the responsibility of the caller rather than CompetitiveFreqNormAccumulator.
   
   I will move the copy/addAll selection to caller.
   
   Before that, there are some performance data I want to share:
   
   First of all, I measured method execution time(without assertion):
   Method | ns | diff
   -- | -- | --
   addAll | 48291 | -
   copy | 17000 |  +64.7%
   
   And add the implementation to JMH to enable JIT's optimization(auto-unroll, auto-vectorization):
   ````
   Benchmark                            (size)   Mode  Cnt   Score   Error   Units
   VectorUtilBenchmark.accAddAll           256  thrpt   15  17.311 ± 4.468  ops/us
   VectorUtilBenchmark.accCopy             256  thrpt   15  21.417 ± 7.009  ops/us
   
   Benchmark                            (size)   Mode  Cnt   Score   Error   Units
   VectorUtilBenchmark.accAddAll           256  thrpt   15  19.943 ± 4.831  ops/us
   VectorUtilBenchmark.accCopy             256  thrpt   15  16.516 ± 1.904  ops/us
   ````
   
   The speedup is not stable, so I split the implementation into array part and treeSet part.
   1. Array part: 
   ````
   Benchmark                              (size)   Mode  Cnt    Score    Error   Units
   VectorUtilBenchmark.arrayCopy             256  thrpt   15  141.356 ±  3.640  ops/us 
   VectorUtilBenchmark.arrayMax              256  thrpt   15   77.875 ±  1.310  ops/us 
   
   Benchmark                              (size)   Mode  Cnt    Score    Error   Units
   VectorUtilBenchmark.arrayCopy             256  thrpt   15  138.484 ±  5.038  ops/us
   VectorUtilBenchmark.arrayMax              256  thrpt   15   78.250 ±  0.992  ops/us
   ````
   ArrayMax is the baseline that iterates the array and get the max value. 
   ArrayCopy uses ``System.arraycopy``, which get a explicit speedup.
   
   2. TreeSet part:
   ````
   Benchmark                          (size)   Mode  Cnt   Score    Error   Units
   VectorUtilBenchmark.treeAddAll        256  thrpt   15  27.673 ± 19.096  ops/us
   VectorUtilBenchmark.treeScalarAdd     256  thrpt   15  34.888 ± 14.321  ops/us
   
   Benchmark                          (size)   Mode  Cnt   Score    Error   Units
   VectorUtilBenchmark.treeAddAll        256  thrpt   15  25.342 ±  4.151  ops/us
   VectorUtilBenchmark.treeScalarAdd     256  thrpt   15  42.673 ± 19.396  ops/us
   ````
   I also measured the method performance without JMH:
   Method | ns | diff
   -- | -- | --
   treeScalarAdd | 36000 | -
   treeAddAll | 26875 |  +25.3%
   
   The method performance indicates that `TreeSet.addAll` get a better performance, but JMH benchmark result indicates that scalarAdd(baseline) get a better performance than `TreeSet.addAll`, which is unexpected for me. I am still working on 
   finding out the reason. 
   
   Finally, I modified `acc.copy` to just copy the array but use the original iterate add for treeSet, and measured the performance: 
   ````
   Benchmark                            (size)   Mode  Cnt   Score    Error   Units
   VectorUtilBenchmark.accAddAll           256  thrpt   15  16.180 ±  3.014  ops/us
   VectorUtilBenchmark.accCopy             256  thrpt   15  31.807 ± 12.811  ops/us
   
   Benchmark                            (size)   Mode  Cnt   Score   Error   Units
   VectorUtilBenchmark.accAddAll           256  thrpt   15  20.414 ± 3.693  ops/us
   VectorUtilBenchmark.accCopy             256  thrpt   15  24.432 ± 6.862  ops/us
   ````
   Now `acc.copy`  get a speedup.
   Any idea about that? @jpountz 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1886335529

   @jpountz 
   I removed the contract of empty on acc. Please take a look when you get a chance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on code in PR #12846:
URL: https://github.com/apache/lucene/pull/12846#discussion_r1405891395


##########
lucene/core/src/java/org/apache/lucene/codecs/CompetitiveImpactAccumulator.java:
##########
@@ -93,6 +93,21 @@ public void addAll(CompetitiveImpactAccumulator acc) {
     assert assertConsistent();
   }
 
+  /** Copy {@code acc} into this empty acc. */
+  public void copy(CompetitiveImpactAccumulator acc) {
+    int[] maxFreqs = this.maxFreqs;
+    int[] otherMaxFreqs = acc.maxFreqs;
+    assert Arrays.stream(maxFreqs).sum() == 0;
+
+    System.arraycopy(otherMaxFreqs, 0, maxFreqs, 0, maxFreqs.length);
+
+    for (Impact entry : acc.otherFreqNormPairs) {
+      add(entry, otherFreqNormPairs);
+    }

Review Comment:
   > Did you observe a speedup with this change? 
   
   I measured it with TestTermScorer.testRandomTopDocs without assertion, the result it not stable.
   I will try other benchmarks.
   
   > Also can you add a test?
   > It looks like we can optimize this bit as well by copying entries from the treeset directly?
   
   Thanks for your suggestion, i am working on this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1831132910

   > replace entries of the treeset with entries of the other treeset by clearing it first, and then doing an addAll.
   
   Sorry, I am confused about this. If we clear an unEmpty treeset, how to deal with its competitive entries?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1880899505

   This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1829284442

   @jpountz 
   I added a test case for copy accumulator, and replaced iterate add with addAll for otherFreqNormPairs. Please take a look when you get a chance.
   
   https://github.com/apache/lucene/blob/9574cbd1f18dc751c98e09cc1cfc3da5ecacaca8/lucene/core/src/java/org/apache/lucene/codecs/CompetitiveImpactAccumulator.java#L134-L143
   If I am not mistaken, the further entries(after break) may have more lesser freq and greater norm, should we move them? 
   
   > And then maybe we can drop the requirement that the current accumulator is empty?
   
   Do you mean copy the treeset's all entries regardless wether they are competitice, into current  accumulator wether it is empty or not, and does not remove less competitive entries which implemented in add phase?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1832112426

   > Sorry, I am still confused about this. If we clear an unEmpty treeset, how to deal with its competitive entries?
   
   This looks similar to the arraycopy that you perform on the `maxFreqs` array, ignoring existing values? We would just need to update the contract of this method to say something like `Copy the provided accumulator into this accumulator, ignoring its existing state. This is effectively the same as calling clear() then addAll().`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on code in PR #12846:
URL: https://github.com/apache/lucene/pull/12846#discussion_r1447008409


##########
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99SkipWriter.java:
##########
@@ -202,7 +202,11 @@ protected void writeSkipData(int level, DataOutput skipBuffer) throws IOExceptio
     CompetitiveImpactAccumulator competitiveFreqNorms = curCompetitiveFreqNorms[level];
     assert competitiveFreqNorms.getCompetitiveFreqNormPairs().size() > 0;
     if (level + 1 < numberOfSkipLevels) {
-      curCompetitiveFreqNorms[level + 1].addAll(competitiveFreqNorms);
+      if (curCompetitiveFreqNorms[level + 1].isEmpty()) {
+        curCompetitiveFreqNorms[level + 1].copy(competitiveFreqNorms);
+      } else {
+        curCompetitiveFreqNorms[level + 1].addAll(competitiveFreqNorms);
+      }

Review Comment:
   @jpountz Should we remove this optimization to make code more clearer?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz merged PR #12846:
URL: https://github.com/apache/lucene/pull/12846


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org