You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by GitBox <gi...@apache.org> on 2021/07/30 15:08:06 UTC

[GitHub] [solr] spyk opened a new pull request #242: SOLR-15569: Split on the right subtree when node value == threshold on MultipleAdditiveTreesModel

spyk opened a new pull request #242:
URL: https://github.com/apache/solr/pull/242


   https://issues.apache.org/jira/browse/SOLR-15569
   
   # Description
   Fixes an issue where the MultipleAdditiveTreesModel does not split correctly when the tree node value equals the split threshold. This was discovered while testing a translated XGBoost model for LTR and getting slightly different score results.
   
   *NOTE:* The previous logic split to the left and also added a NODE_SPLIT_SLACK that has been removed. Not sure if this original logic was part of another model, or served another purpose.
   
   # Solution
   
   - Changed split logic to split on the right subtree when the node value equals the threshold
   - Removed NODE_SPLIT_SLACK completely
   
   # Tests
   
   - Added TestMultipleAdditiveTreesModel.testMultipleAdditiveTreesSplitAtThreshold test to showcase the issue
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [X] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability.
   - [X] I have created a Jira issue and added the issue ID to my pull request title.
   - [X] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [X] I have developed this patch against the `main` branch.
   - [X] I have run `./gradlew check`.
   - [X] I have added tests for my changes.
   - [ ] I have added documentation for the [Reference Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] spyk commented on a change in pull request #242: SOLR-15569: Split on the right subtree when node value == threshold on MultipleAdditiveTreesModel

Posted by GitBox <gi...@apache.org>.
spyk commented on a change in pull request #242:
URL: https://github.com/apache/solr/pull/242#discussion_r683387756



##########
File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/model/MultipleAdditiveTreesModel.java
##########
@@ -292,10 +291,10 @@ private static float scoreNode(float[] featureVector, RegressionTreeNode regress
         return 0f;
       }
 
-      if (featureVector[regressionTreeNode.featureIndex] <= regressionTreeNode.threshold) {
-        regressionTreeNode = regressionTreeNode.left;
-      } else {
+      if (featureVector[regressionTreeNode.featureIndex] >= regressionTreeNode.threshold) {

Review comment:
       We run into case with a specific XGBoost model where a variable might take on 0 or 1 and the split condition being `1.0`. So, in that case if we go left by default the right path is never evaluated, which results in silently decreased model performance. 
   
   But, agree it's a rather big change, there might be issues with a) other tree models that split to the left b) backwards incompatibility, so @cpoerschke suggestion makes a lot of sense.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] spyk commented on a change in pull request #242: SOLR-15569: Split on the right subtree when node value == threshold on MultipleAdditiveTreesModel

Posted by GitBox <gi...@apache.org>.
spyk commented on a change in pull request #242:
URL: https://github.com/apache/solr/pull/242#discussion_r698497679



##########
File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/model/MultipleAdditiveTreesModel.java
##########
@@ -292,10 +291,10 @@ private static float scoreNode(float[] featureVector, RegressionTreeNode regress
         return 0f;
       }
 
-      if (featureVector[regressionTreeNode.featureIndex] <= regressionTreeNode.threshold) {
-        regressionTreeNode = regressionTreeNode.left;
-      } else {
+      if (featureVector[regressionTreeNode.featureIndex] >= regressionTreeNode.threshold) {

Review comment:
       @diegoceccarelli not sure if there are different tree models that either split to left or right when equal. At least in the case of the XGBoost model, it seems to split to the right by default. I'll check if I can add that flag functionality on the PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] alessandrobenedetti commented on a change in pull request #242: SOLR-15569: Split on the right subtree when node value == threshold on MultipleAdditiveTreesModel

Posted by GitBox <gi...@apache.org>.
alessandrobenedetti commented on a change in pull request #242:
URL: https://github.com/apache/solr/pull/242#discussion_r683276722



##########
File path: solr/contrib/ltr/src/test-files/modelExamples/multipleadditivetreesmodel_split_at_threshold.json
##########
@@ -0,0 +1,23 @@
+{
+    "class":"org.apache.solr.ltr.model.MultipleAdditiveTreesModel",
+    "name":"multipleadditivetreesmodel_split_at_threshold",
+    "features":[
+        { "name": "constantScoreToForceMultipleAdditiveTreesScoreAllDocs"}
+    ],
+    "params":{
+        "trees": [
+            {
+                "weight" : "1f",
+                "root": {
+                    "feature": "constantScoreToForceMultipleAdditiveTreesScoreAllDocs",
+                    "threshold": "1.0f",

Review comment:
       we should verify this, I think branching is generated on <=left and > right in the training algorithm




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] alessandrobenedetti commented on a change in pull request #242: SOLR-15569: Split on the right subtree when node value == threshold on MultipleAdditiveTreesModel

Posted by GitBox <gi...@apache.org>.
alessandrobenedetti commented on a change in pull request #242:
URL: https://github.com/apache/solr/pull/242#discussion_r683276114



##########
File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/model/MultipleAdditiveTreesModel.java
##########
@@ -292,10 +291,10 @@ private static float scoreNode(float[] featureVector, RegressionTreeNode regress
         return 0f;
       }
 
-      if (featureVector[regressionTreeNode.featureIndex] <= regressionTreeNode.threshold) {
-        regressionTreeNode = regressionTreeNode.left;
-      } else {
+      if (featureVector[regressionTreeNode.featureIndex] >= regressionTreeNode.threshold) {

Review comment:
       I think we should keep it consistent, isn't <= in XGboost and similar?
   This is a pretty big change, and I think it shouldn't be necssary

##########
File path: solr/contrib/ltr/src/test-files/modelExamples/multipleadditivetreesmodel_split_at_threshold.json
##########
@@ -0,0 +1,23 @@
+{
+    "class":"org.apache.solr.ltr.model.MultipleAdditiveTreesModel",
+    "name":"multipleadditivetreesmodel_split_at_threshold",
+    "features":[
+        { "name": "constantScoreToForceMultipleAdditiveTreesScoreAllDocs"}
+    ],
+    "params":{
+        "trees": [
+            {
+                "weight" : "1f",
+                "root": {
+                    "feature": "constantScoreToForceMultipleAdditiveTreesScoreAllDocs",
+                    "threshold": "1.0f",

Review comment:
       we should verify this, I think branching is generated on <=left and > right

##########
File path: solr/contrib/ltr/src/test-files/modelExamples/multipleadditivetreesmodel_split_at_threshold.json
##########
@@ -0,0 +1,23 @@
+{
+    "class":"org.apache.solr.ltr.model.MultipleAdditiveTreesModel",
+    "name":"multipleadditivetreesmodel_split_at_threshold",
+    "features":[
+        { "name": "constantScoreToForceMultipleAdditiveTreesScoreAllDocs"}
+    ],
+    "params":{
+        "trees": [
+            {
+                "weight" : "1f",
+                "root": {
+                    "feature": "constantScoreToForceMultipleAdditiveTreesScoreAllDocs",
+                    "threshold": "1.0f",

Review comment:
       we should verify this, I think branching is generated on <=left and > right in the training algorithm




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] cpoerschke commented on a change in pull request #242: SOLR-15569: Split on the right subtree when node value == threshold on MultipleAdditiveTreesModel

Posted by GitBox <gi...@apache.org>.
cpoerschke commented on a change in pull request #242:
URL: https://github.com/apache/solr/pull/242#discussion_r683304620



##########
File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/model/MultipleAdditiveTreesModel.java
##########
@@ -292,10 +291,10 @@ private static float scoreNode(float[] featureVector, RegressionTreeNode regress
         return 0f;
       }
 
-      if (featureVector[regressionTreeNode.featureIndex] <= regressionTreeNode.threshold) {
-        regressionTreeNode = regressionTreeNode.left;
-      } else {
+      if (featureVector[regressionTreeNode.featureIndex] >= regressionTreeNode.threshold) {

Review comment:
       The `==` going left or right is an interesting question! One way to maintain backwards compatibility whilst supporting "go right instead of left" could be via an optional configuration element e.g.
   
   ```
   - "params" : { "trees" : [ ... ] }
   + "params" : { "trees" : [ ... ], "splitToRight" : true }
   ```
   
   And if there is a `scoreNode` change then `explainNode` requires a matching change I think.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] diegoceccarelli commented on a change in pull request #242: SOLR-15569: Split on the right subtree when node value == threshold on MultipleAdditiveTreesModel

Posted by GitBox <gi...@apache.org>.
diegoceccarelli commented on a change in pull request #242:
URL: https://github.com/apache/solr/pull/242#discussion_r689056660



##########
File path: solr/contrib/ltr/src/test/org/apache/solr/ltr/model/TestMultipleAdditiveTreesModel.java
##########
@@ -249,5 +274,5 @@ public void multipleAdditiveTreesTestUnknownFeature(){
     });
     assertEquals(expectedException.toString(), ex.toString());
   }
- 
+

Review comment:
       unintended? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] alessandrobenedetti commented on a change in pull request #242: SOLR-15569: Split on the right subtree when node value == threshold on MultipleAdditiveTreesModel

Posted by GitBox <gi...@apache.org>.
alessandrobenedetti commented on a change in pull request #242:
URL: https://github.com/apache/solr/pull/242#discussion_r683276114



##########
File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/model/MultipleAdditiveTreesModel.java
##########
@@ -292,10 +291,10 @@ private static float scoreNode(float[] featureVector, RegressionTreeNode regress
         return 0f;
       }
 
-      if (featureVector[regressionTreeNode.featureIndex] <= regressionTreeNode.threshold) {
-        regressionTreeNode = regressionTreeNode.left;
-      } else {
+      if (featureVector[regressionTreeNode.featureIndex] >= regressionTreeNode.threshold) {

Review comment:
       I think we should keep it consistent, isn't <= in XGboost and similar?
   This is a pretty big change, and I think it shouldn't be necssary




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] diegoceccarelli commented on a change in pull request #242: SOLR-15569: Split on the right subtree when node value == threshold on MultipleAdditiveTreesModel

Posted by GitBox <gi...@apache.org>.
diegoceccarelli commented on a change in pull request #242:
URL: https://github.com/apache/solr/pull/242#discussion_r689056500



##########
File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/model/MultipleAdditiveTreesModel.java
##########
@@ -292,10 +291,10 @@ private static float scoreNode(float[] featureVector, RegressionTreeNode regress
         return 0f;
       }
 
-      if (featureVector[regressionTreeNode.featureIndex] <= regressionTreeNode.threshold) {
-        regressionTreeNode = regressionTreeNode.left;
-      } else {
+      if (featureVector[regressionTreeNode.featureIndex] >= regressionTreeNode.threshold) {

Review comment:
       I think `==` should be always on the left. But I agree with `splitToRight` for backward compatibility. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] alessandrobenedetti commented on a change in pull request #242: SOLR-15569: Split on the right subtree when node value == threshold on MultipleAdditiveTreesModel

Posted by GitBox <gi...@apache.org>.
alessandrobenedetti commented on a change in pull request #242:
URL: https://github.com/apache/solr/pull/242#discussion_r683276722



##########
File path: solr/contrib/ltr/src/test-files/modelExamples/multipleadditivetreesmodel_split_at_threshold.json
##########
@@ -0,0 +1,23 @@
+{
+    "class":"org.apache.solr.ltr.model.MultipleAdditiveTreesModel",
+    "name":"multipleadditivetreesmodel_split_at_threshold",
+    "features":[
+        { "name": "constantScoreToForceMultipleAdditiveTreesScoreAllDocs"}
+    ],
+    "params":{
+        "trees": [
+            {
+                "weight" : "1f",
+                "root": {
+                    "feature": "constantScoreToForceMultipleAdditiveTreesScoreAllDocs",
+                    "threshold": "1.0f",

Review comment:
       we should verify this, I think branching is generated on <=left and > right




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] spyk commented on pull request #242: SOLR-15569: Split on the right subtree when node value == threshold on MultipleAdditiveTreesModel

Posted by GitBox <gi...@apache.org>.
spyk commented on pull request #242:
URL: https://github.com/apache/solr/pull/242#issuecomment-926834765


   I took a stab on the 'splitToRight' flag implementation. Please review and let me know of any issues, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] cpoerschke commented on a change in pull request #242: SOLR-15569: Split on the right subtree when node value == threshold on MultipleAdditiveTreesModel

Posted by GitBox <gi...@apache.org>.
cpoerschke commented on a change in pull request #242:
URL: https://github.com/apache/solr/pull/242#discussion_r683304620



##########
File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/model/MultipleAdditiveTreesModel.java
##########
@@ -292,10 +291,10 @@ private static float scoreNode(float[] featureVector, RegressionTreeNode regress
         return 0f;
       }
 
-      if (featureVector[regressionTreeNode.featureIndex] <= regressionTreeNode.threshold) {
-        regressionTreeNode = regressionTreeNode.left;
-      } else {
+      if (featureVector[regressionTreeNode.featureIndex] >= regressionTreeNode.threshold) {

Review comment:
       The `==` going left or right is an interesting question! One way to maintain backwards compatibility whilst supporting "go right instead of left" could be via an optional configuration element e.g.
   
   ```
   - "params" : { "trees" : [ ... ] }
   + "params" : { "trees" : [ ... ], "splitToRight" : true }
   ```
   
   And if there is a `scoreNode` change then `explainNode` requires a matching change I think.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] spyk commented on a change in pull request #242: SOLR-15569: Split on the right subtree when node value == threshold on MultipleAdditiveTreesModel

Posted by GitBox <gi...@apache.org>.
spyk commented on a change in pull request #242:
URL: https://github.com/apache/solr/pull/242#discussion_r698497679



##########
File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/model/MultipleAdditiveTreesModel.java
##########
@@ -292,10 +291,10 @@ private static float scoreNode(float[] featureVector, RegressionTreeNode regress
         return 0f;
       }
 
-      if (featureVector[regressionTreeNode.featureIndex] <= regressionTreeNode.threshold) {
-        regressionTreeNode = regressionTreeNode.left;
-      } else {
+      if (featureVector[regressionTreeNode.featureIndex] >= regressionTreeNode.threshold) {

Review comment:
       @diegoceccarelli not sure if there are different tree models that either split to left or right when equal. At least in the case of the XGBoost model, it seems to split to the right by default. I'll check if I can add that flag functionality on the PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org