You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/09/13 13:05:38 UTC

[GitHub] [lucene] gsmiller opened a new pull request, #11768: Fix tie-break bug in various Facets implementations

gsmiller opened a new pull request, #11768:
URL: https://github.com/apache/lucene/pull/11768

   ### Description
   
   There are a number of places in `Facets` implementations where `getTopChildren` is incorrectly handling count/value ties. The behavior should prefer smaller ordinals when counts/values are equal, but it's not always doing that. Some tests were also incorrect and needed to be updated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on a diff in pull request #11768: Fix tie-break bug in various Facets implementations

Posted by GitBox <gi...@apache.org>.
gsmiller commented on code in PR #11768:
URL: https://github.com/apache/lucene/pull/11768#discussion_r980530138


##########
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FloatTaxonomyFacets.java:
##########
@@ -189,10 +190,11 @@ private TopChildrenForPath getTopChildrenForPath(DimConfig dimConfig, int pathOr
 
     TopOrdAndFloatQueue.OrdAndValue reuse = null;
     while (ord != TaxonomyReader.INVALID_ORDINAL) {
-      if (values[ord] > 0) {
+      float value = values[ord];
+      if (value > 0) {
         aggregatedValue = aggregationFunction.aggregate(aggregatedValue, values[ord]);

Review Comment:
   Thanks for the catch. I intended to do this everyone but looks like I missed a spot. I'll go back through and re-check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on pull request #11768: Fix tie-break bug in various Facets implementations

Posted by GitBox <gi...@apache.org>.
gsmiller commented on PR #11768:
URL: https://github.com/apache/lucene/pull/11768#issuecomment-1255483139

   @Yuti-G could you help me understand what faceting implementation or part of the code you're referring to? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on pull request #11768: Fix tie-break bug in various Facets implementations

Posted by GitBox <gi...@apache.org>.
gsmiller commented on PR #11768:
URL: https://github.com/apache/lucene/pull/11768#issuecomment-1255562840

   @Yuti-G thanks for the links. In this case, the contract is that we break ties by the value (of the long) itself (low-to-high), which the PQ is already doing. So this appears to be correct to me, but let me know if I'm overlooking something. Also, it's not possible to have identical values between two results since the counting structures guarantee unique indexes/keys right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on a diff in pull request #11768: Fix tie-break bug in various Facets implementations

Posted by GitBox <gi...@apache.org>.
gsmiller commented on code in PR #11768:
URL: https://github.com/apache/lucene/pull/11768#discussion_r980530440


##########
lucene/facet/src/test/org/apache/lucene/facet/TestDrillSideways.java:
##########
@@ -626,7 +626,7 @@ public void testBasicWithCollectorManager() throws Exception {
     List<FacetResult> topNDimsResult = r.facets.getTopDims(1, 2);
     assertEquals(1, topNDimsResult.size());
     assertEquals(
-        "dim=Author path=[] value=5 childCount=4\n  Lisa (2)\n  Susan (1)\n",

Review Comment:
   Yeah, hindsight is 20/20 I suppose. We'll eventually converge on something that's correct :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller merged pull request #11768: Fix tie-break bug in various Facets implementations

Posted by GitBox <gi...@apache.org>.
gsmiller merged PR #11768:
URL: https://github.com/apache/lucene/pull/11768


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] Yuti-G commented on pull request #11768: Fix tie-break bug in various Facets implementations

Posted by GitBox <gi...@apache.org>.
Yuti-G commented on PR #11768:
URL: https://github.com/apache/lucene/pull/11768#issuecomment-1255500611

   Sure, I just updated the previous comment with links. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] Yuti-G commented on pull request #11768: Fix tie-break bug in various Facets implementations

Posted by GitBox <gi...@apache.org>.
Yuti-G commented on PR #11768:
URL: https://github.com/apache/lucene/pull/11768#issuecomment-1255662264

   I see.. Thanks for the explanation of indexes!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] Yuti-G commented on pull request #11768: Fix tie-break bug in various Facets implementations

Posted by GitBox <gi...@apache.org>.
Yuti-G commented on PR #11768:
URL: https://github.com/apache/lucene/pull/11768#issuecomment-1255341964

   Thanks @gsmiller for discovering this issue! The changes look good to me.
   
   I am curious if the `index` in `LongIntCursor` works similarly to `ordinals` in other faceting implementation? If so, do you think we should also return `a.count < b.count || (a.count == b.count && a.value > b.value) || (a.count == b.count && a.value == b.value && a.index < b.index)` in the `lessThan()` function of the PQ in `getTopChildrenSortByCount` in the `LongValueFacetCounts` class? Please let me know if I misunderstand the `index` here. Thank you so much!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] shaie commented on a diff in pull request #11768: Fix tie-break bug in various Facets implementations

Posted by GitBox <gi...@apache.org>.
shaie commented on code in PR #11768:
URL: https://github.com/apache/lucene/pull/11768#discussion_r979357774


##########
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FloatTaxonomyFacets.java:
##########
@@ -189,10 +190,11 @@ private TopChildrenForPath getTopChildrenForPath(DimConfig dimConfig, int pathOr
 
     TopOrdAndFloatQueue.OrdAndValue reuse = null;
     while (ord != TaxonomyReader.INVALID_ORDINAL) {
-      if (values[ord] > 0) {
+      float value = values[ord];
+      if (value > 0) {
         aggregatedValue = aggregationFunction.aggregate(aggregatedValue, values[ord]);

Review Comment:
   nit: might as well use `value` here too (and check if we you can replace `values[ord]` with `value` elsewhere



##########
lucene/facet/src/test/org/apache/lucene/facet/TestDrillSideways.java:
##########
@@ -626,7 +626,7 @@ public void testBasicWithCollectorManager() throws Exception {
     List<FacetResult> topNDimsResult = r.facets.getTopDims(1, 2);
     assertEquals(1, topNDimsResult.size());
     assertEquals(
-        "dim=Author path=[] value=5 childCount=4\n  Lisa (2)\n  Susan (1)\n",

Review Comment:
   It's disturbing that these tests were "wrong" and we just let them be like that. I'm glad that you fixed them, but makes me wonder if it was possible to catch this bug earlier by scrutinizing these tests better.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org