You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "vsop-479 (via GitHub)" <gi...@apache.org> on 2023/11/27 06:56:23 UTC
[PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]
vsop-479 opened a new pull request, #12846:
URL: https://github.com/apache/lucene/pull/12846
We could copy collected acc into buffered acc, rather than merge them, in bufferSkip phase.
Because the target CompetitiveImpactAccumulator is cleared in last writeSkipData phase.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]
Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on code in PR #12846:
URL: https://github.com/apache/lucene/pull/12846#discussion_r1444694744
##########
lucene/core/src/java/org/apache/lucene/codecs/CompetitiveImpactAccumulator.java:
##########
@@ -93,6 +93,21 @@ public void addAll(CompetitiveImpactAccumulator acc) {
assert assertConsistent();
}
+ /** Copy {@code acc} into this empty acc. */
+ public void copy(CompetitiveImpactAccumulator acc) {
+ int[] maxFreqs = this.maxFreqs;
+ int[] otherMaxFreqs = acc.maxFreqs;
+ assert Arrays.stream(maxFreqs).sum() == 0;
+
+ System.arraycopy(otherMaxFreqs, 0, maxFreqs, 0, maxFreqs.length);
+
+ for (Impact entry : acc.otherFreqNormPairs) {
+ add(entry, otherFreqNormPairs);
+ }
Review Comment:
Let's remove the requirement that the accumulator is empty, it doesn't seem necessary? We can then simply implement this copy logic like that?
```
System.arraycopy(otherMaxFreqs, 0, maxFreqs, 0, maxFreqs.length);
otherFreqNormPairs.clear();
otherFreqNormPairs.addAll(other.otherFreqNormPairs);
```
And we can remove the `empty` flag on this class?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]
Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on code in PR #12846:
URL: https://github.com/apache/lucene/pull/12846#discussion_r1444698183
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99SkipWriter.java:
##########
@@ -202,7 +202,11 @@ protected void writeSkipData(int level, DataOutput skipBuffer) throws IOExceptio
CompetitiveImpactAccumulator competitiveFreqNorms = curCompetitiveFreqNorms[level];
assert competitiveFreqNorms.getCompetitiveFreqNormPairs().size() > 0;
if (level + 1 < numberOfSkipLevels) {
- curCompetitiveFreqNorms[level + 1].addAll(competitiveFreqNorms);
+ if (curCompetitiveFreqNorms[level + 1].isEmpty()) {
+ curCompetitiveFreqNorms[level + 1].copy(competitiveFreqNorms);
+ } else {
+ curCompetitiveFreqNorms[level + 1].addAll(competitiveFreqNorms);
+ }
Review Comment:
This optimization doesn't look like it would kick in often, does it?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]
Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on code in PR #12846:
URL: https://github.com/apache/lucene/pull/12846#discussion_r1447075161
##########
lucene/core/src/java/org/apache/lucene/codecs/CompetitiveImpactAccumulator.java:
##########
@@ -93,6 +93,21 @@ public void addAll(CompetitiveImpactAccumulator acc) {
assert assertConsistent();
}
+ /** Copy {@code acc} into this empty acc. */
+ public void copy(CompetitiveImpactAccumulator acc) {
+ int[] maxFreqs = this.maxFreqs;
+ int[] otherMaxFreqs = acc.maxFreqs;
+ assert Arrays.stream(maxFreqs).sum() == 0;
+
+ System.arraycopy(otherMaxFreqs, 0, maxFreqs, 0, maxFreqs.length);
+
+ for (Impact entry : acc.otherFreqNormPairs) {
+ add(entry, otherFreqNormPairs);
+ }
Review Comment:
> And we can remove the empty flag on this class?
The empty flag is used to indicate wether non-zero level acc is empty, but the optimization for non-zero level acc doesn't kick in often, so I removed it.
I think ``otherFreqNormPairs.clear()`` is unnecessary, because ``curCompetitiveFreqNorms[0]`` is cleared in ``writeSkipData``.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]
Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1833216581
@jpountz I applied copy for all level's empty acc, Please take a look when you get a chance.
As so far I just performed copy on empty acc that cleared by ``clear``. Do you mean we could skip the ``clear`` (mainly ``Arrays.fill(maxFreqs, 0)``) after ``writeSkipData``, and override the stale acc in next writing directly?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]
Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on code in PR #12846:
URL: https://github.com/apache/lucene/pull/12846#discussion_r1448225023
##########
lucene/core/src/java/org/apache/lucene/codecs/CompetitiveImpactAccumulator.java:
##########
@@ -93,6 +93,20 @@ public void addAll(CompetitiveImpactAccumulator acc) {
assert assertConsistent();
}
+ /** Copy {@code acc} into this empty acc. */
+ public void copy(CompetitiveImpactAccumulator acc) {
+ assert Arrays.stream(maxFreqs).sum() == 0;
+ assert otherFreqNormPairs.isEmpty();
Review Comment:
> why should the object currently be empty?
I think the caller should ensure current acc is empty when calling ``copy()``. If current acc is not empty, the caller should call ``CompetitiveImpactAccumulator.addAll()`` to merge them.
But you are right, it is the caller's responsibility to decide to call which low level implementation. I will fix it!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]
Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on code in PR #12846:
URL: https://github.com/apache/lucene/pull/12846#discussion_r1444694744
##########
lucene/core/src/java/org/apache/lucene/codecs/CompetitiveImpactAccumulator.java:
##########
@@ -93,6 +93,21 @@ public void addAll(CompetitiveImpactAccumulator acc) {
assert assertConsistent();
}
+ /** Copy {@code acc} into this empty acc. */
+ public void copy(CompetitiveImpactAccumulator acc) {
+ int[] maxFreqs = this.maxFreqs;
+ int[] otherMaxFreqs = acc.maxFreqs;
+ assert Arrays.stream(maxFreqs).sum() == 0;
+
+ System.arraycopy(otherMaxFreqs, 0, maxFreqs, 0, maxFreqs.length);
+
+ for (Impact entry : acc.otherFreqNormPairs) {
+ add(entry, otherFreqNormPairs);
+ }
Review Comment:
Let's remove the requirement that the accumulator is empty, it doesn't seem necessary? We can then simply implement this copy logic like that?
```
System.arraycopy(otherMaxFreqs, 0, maxFreqs, 0, maxFreqs.length);
otherFreqNormPairs.clear();
otherFreqNormPairs.addAll(other.otherFreqNormPairs);
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]
Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1855623955
@jpountz
I moved the copy/addAll selection to the caller, and reverted treeSet's addAll to iterate add, since there is no speedup with `TreeSet.addAll` in JMH.
Please take a look when you get a chance!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]
Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1884449395
@jpountz
I removed the optimization for non-zero level acc, and the empty flag.
Please take a look when you get a chance!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]
Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1827252377
@jpountz Please take a look when you get a chance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]
Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on code in PR #12846:
URL: https://github.com/apache/lucene/pull/12846#discussion_r1405810450
##########
lucene/core/src/java/org/apache/lucene/codecs/CompetitiveImpactAccumulator.java:
##########
@@ -93,6 +93,21 @@ public void addAll(CompetitiveImpactAccumulator acc) {
assert assertConsistent();
}
+ /** Copy {@code acc} into this empty acc. */
+ public void copy(CompetitiveImpactAccumulator acc) {
+ int[] maxFreqs = this.maxFreqs;
+ int[] otherMaxFreqs = acc.maxFreqs;
+ assert Arrays.stream(maxFreqs).sum() == 0;
+
+ System.arraycopy(otherMaxFreqs, 0, maxFreqs, 0, maxFreqs.length);
+
+ for (Impact entry : acc.otherFreqNormPairs) {
+ add(entry, otherFreqNormPairs);
+ }
Review Comment:
It looks like we can optimize this bit as well by copying entries from the treeset directly? And then maybe we can drop the requirement that the current accumulator is empty?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]
Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on code in PR #12846:
URL: https://github.com/apache/lucene/pull/12846#discussion_r1445677720
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99SkipWriter.java:
##########
@@ -202,7 +202,11 @@ protected void writeSkipData(int level, DataOutput skipBuffer) throws IOExceptio
CompetitiveImpactAccumulator competitiveFreqNorms = curCompetitiveFreqNorms[level];
assert competitiveFreqNorms.getCompetitiveFreqNormPairs().size() > 0;
if (level + 1 < numberOfSkipLevels) {
- curCompetitiveFreqNorms[level + 1].addAll(competitiveFreqNorms);
+ if (curCompetitiveFreqNorms[level + 1].isEmpty()) {
+ curCompetitiveFreqNorms[level + 1].copy(competitiveFreqNorms);
+ } else {
+ curCompetitiveFreqNorms[level + 1].addAll(competitiveFreqNorms);
+ }
Review Comment:
Yes, This optimization only affects non-zero level acc.
I measured the called count with TestTermScore.testRandomTopDocs (numDocs = TEST_NIGHTLY):
Method | count
-- | --
copy zero level (bufferSkip) | 1270
copy non zero (writeSkipData) | 214
addAll (writeSkipData) | 1196
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]
Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on code in PR #12846:
URL: https://github.com/apache/lucene/pull/12846#discussion_r1447359065
##########
lucene/core/src/java/org/apache/lucene/codecs/CompetitiveImpactAccumulator.java:
##########
@@ -93,6 +93,20 @@ public void addAll(CompetitiveImpactAccumulator acc) {
assert assertConsistent();
}
+ /** Copy {@code acc} into this empty acc. */
+ public void copy(CompetitiveImpactAccumulator acc) {
+ assert Arrays.stream(maxFreqs).sum() == 0;
+ assert otherFreqNormPairs.isEmpty();
Review Comment:
Sorry I have not been clear in previous comments, but the thing that bugs me a bit is this assertion that the accumulator is currently empty that gives a contract to this method that is surprising to me: why should the object currently be empty? When I look at other `copy()` methods we have, like `BytesRefBuilder#copy`, they don't expect the current object to be empty, it is expected to behave like a `clear()` followed by an `append()` (or `addAll()` for this accumulator). And it looks like we can easily have a similar contract here?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]
Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1829969643
Sorry I wasn't clear, I meant to replace entries of the treeset with entries of the other treeset by clearing it first, and then doing an `addAll`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]
Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1842384044
> is this new approach helping cover more cases maybe?
Yes, Previous patch only apply copy for level 0 's acc in ``Lucene99SkipWriter.bufferSkip``, Current patch apply copy for all level's acc in ``Lucene99SkipWriter.writeSkipData`` by adding a flag to indicates wether this is an empty acc.
> when knowing when the accumulator is empty is the responsibility of the caller rather than CompetitiveFreqNormAccumulator.
I will move the copy/addAll selection to caller.
Before that, there are some performance data I want to share:
First of all, I measured method execution time(without assertion):
Method | ns | diff
-- | -- | --
addAll | 48291 | -
copy | 17000 | +64.7%
And add the implementation to JMH to enable JIT's optimization(auto-unroll, auto-vectorization):
````
Benchmark (size) Mode Cnt Score Error Units
VectorUtilBenchmark.accAddAll 256 thrpt 15 17.311 ± 4.468 ops/us
VectorUtilBenchmark.accCopy 256 thrpt 15 21.417 ± 7.009 ops/us
Benchmark (size) Mode Cnt Score Error Units
VectorUtilBenchmark.accAddAll 256 thrpt 15 19.943 ± 4.831 ops/us
VectorUtilBenchmark.accCopy 256 thrpt 15 16.516 ± 1.904 ops/us
````
The speedup is not stable, so I split the implementation into array part and treeSet part.
1. Array part:
````
Benchmark (size) Mode Cnt Score Error Units
VectorUtilBenchmark.arrayCopy 256 thrpt 15 141.356 ± 3.640 ops/us
VectorUtilBenchmark.arrayMax 256 thrpt 15 77.875 ± 1.310 ops/us
Benchmark (size) Mode Cnt Score Error Units
VectorUtilBenchmark.arrayCopy 256 thrpt 15 138.484 ± 5.038 ops/us
VectorUtilBenchmark.arrayMax 256 thrpt 15 78.250 ± 0.992 ops/us
````
ArrayMax is the baseline that iterates the array and get the max value.
ArrayCopy uses ``System.arraycopy``, which get a explicit speedup.
2. TreeSet part:
````
Benchmark (size) Mode Cnt Score Error Units
VectorUtilBenchmark.treeAddAll 256 thrpt 15 27.673 ± 19.096 ops/us
VectorUtilBenchmark.treeScalarAdd 256 thrpt 15 34.888 ± 14.321 ops/us
Benchmark (size) Mode Cnt Score Error Units
VectorUtilBenchmark.treeAddAll 256 thrpt 15 25.342 ± 4.151 ops/us
VectorUtilBenchmark.treeScalarAdd 256 thrpt 15 42.673 ± 19.396 ops/us
````
I also measured the method performance without JMH:
Method | ns | diff
-- | -- | --
treeScalarAdd | 36000 | -
treeAddAll | 26875 | +25.3%
The method performance indicates that `TreeSet.addAll` get a better performance, but JMH benchmark result indicates that scalarAdd(baseline) get a better performance than `TreeSet.addAll`, which is unexpected for me. I am still working on
finding out the reason.
Finally, I modified `acc.copy` to just copy the array but use the original iterate add for treeSet, and measured the performance:
````
Benchmark (size) Mode Cnt Score Error Units
VectorUtilBenchmark.accAddAll 256 thrpt 15 16.180 ± 3.014 ops/us
VectorUtilBenchmark.accCopy 256 thrpt 15 31.807 ± 12.811 ops/us
Benchmark (size) Mode Cnt Score Error Units
VectorUtilBenchmark.accAddAll 256 thrpt 15 20.414 ± 3.693 ops/us
VectorUtilBenchmark.accCopy 256 thrpt 15 24.432 ± 6.862 ops/us
````
Now `acc.copy` get a speedup.
Any idea about that? @jpountz
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]
Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1886335529
@jpountz
I removed the contract of empty on acc. Please take a look when you get a chance!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]
Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on code in PR #12846:
URL: https://github.com/apache/lucene/pull/12846#discussion_r1405891395
##########
lucene/core/src/java/org/apache/lucene/codecs/CompetitiveImpactAccumulator.java:
##########
@@ -93,6 +93,21 @@ public void addAll(CompetitiveImpactAccumulator acc) {
assert assertConsistent();
}
+ /** Copy {@code acc} into this empty acc. */
+ public void copy(CompetitiveImpactAccumulator acc) {
+ int[] maxFreqs = this.maxFreqs;
+ int[] otherMaxFreqs = acc.maxFreqs;
+ assert Arrays.stream(maxFreqs).sum() == 0;
+
+ System.arraycopy(otherMaxFreqs, 0, maxFreqs, 0, maxFreqs.length);
+
+ for (Impact entry : acc.otherFreqNormPairs) {
+ add(entry, otherFreqNormPairs);
+ }
Review Comment:
> Did you observe a speedup with this change?
I measured it with TestTermScorer.testRandomTopDocs without assertion, the result it not stable.
I will try other benchmarks.
> Also can you add a test?
> It looks like we can optimize this bit as well by copying entries from the treeset directly?
Thanks for your suggestion, i am working on this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]
Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1831132910
> replace entries of the treeset with entries of the other treeset by clearing it first, and then doing an addAll.
Sorry, I am confused about this. If we clear an unEmpty treeset, how to deal with its competitive entries?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1880899505
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]
Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1829284442
@jpountz
I added a test case for copy accumulator, and replaced iterate add with addAll for otherFreqNormPairs. Please take a look when you get a chance.
https://github.com/apache/lucene/blob/9574cbd1f18dc751c98e09cc1cfc3da5ecacaca8/lucene/core/src/java/org/apache/lucene/codecs/CompetitiveImpactAccumulator.java#L134-L143
If I am not mistaken, the further entries(after break) may have more lesser freq and greater norm, should we move them?
> And then maybe we can drop the requirement that the current accumulator is empty?
Do you mean copy the treeset's all entries regardless wether they are competitice, into current accumulator wether it is empty or not, and does not remove less competitive entries which implemented in add phase?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]
Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1832112426
> Sorry, I am still confused about this. If we clear an unEmpty treeset, how to deal with its competitive entries?
This looks similar to the arraycopy that you perform on the `maxFreqs` array, ignoring existing values? We would just need to update the contract of this method to say something like `Copy the provided accumulator into this accumulator, ignoring its existing state. This is effectively the same as calling clear() then addAll().`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]
Posted by "vsop-479 (via GitHub)" <gi...@apache.org>.
vsop-479 commented on code in PR #12846:
URL: https://github.com/apache/lucene/pull/12846#discussion_r1447008409
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99SkipWriter.java:
##########
@@ -202,7 +202,11 @@ protected void writeSkipData(int level, DataOutput skipBuffer) throws IOExceptio
CompetitiveImpactAccumulator competitiveFreqNorms = curCompetitiveFreqNorms[level];
assert competitiveFreqNorms.getCompetitiveFreqNormPairs().size() > 0;
if (level + 1 < numberOfSkipLevels) {
- curCompetitiveFreqNorms[level + 1].addAll(competitiveFreqNorms);
+ if (curCompetitiveFreqNorms[level + 1].isEmpty()) {
+ curCompetitiveFreqNorms[level + 1].copy(competitiveFreqNorms);
+ } else {
+ curCompetitiveFreqNorms[level + 1].addAll(competitiveFreqNorms);
+ }
Review Comment:
@jpountz Should we remove this optimization to make code more clearer?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]
Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz merged PR #12846:
URL: https://github.com/apache/lucene/pull/12846
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org