You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by GitBox <gi...@apache.org> on 2021/03/11 10:31:52 UTC

[GitHub] [lucenenet] rauhs opened a new pull request #438: Don't insert extra newline in TFIDFSim's score explanation

rauhs opened a new pull request #438:
URL: https://github.com/apache/lucenenet/pull/438


   Ref: https://github.com/apache/lucene-solr/commit/f0bfcbc7d8fbc5bb2791da60af559e8b0ad6eed6


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] rclabo commented on pull request #438: Don't insert extra newline in TFIDFSim's score explanation

Posted by GitBox <gi...@apache.org>.
rclabo commented on pull request #438:
URL: https://github.com/apache/lucenenet/pull/438#issuecomment-796744829


   I just came to my attention that the change provided in this PR is consistent with Java Lucene  5.5.  LuceneNet is of course is targeting 4.8 but where it makes sense, on occasion, we incorporate changes from a later version.  Now that I realize that 5.5 implements this change, it makes sense to me to accept this pull request since it feels like it fixes a bug that got fixed in 5.5.  Thank you for this PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] NightOwl888 merged pull request #438: Don't insert extra newline in TFIDFSim's score explanation

Posted by GitBox <gi...@apache.org>.
NightOwl888 merged pull request #438:
URL: https://github.com/apache/lucenenet/pull/438


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] rclabo commented on a change in pull request #438: Don't insert extra newline in TFIDFSim's score explanation

Posted by GitBox <gi...@apache.org>.
rclabo commented on a change in pull request #438:
URL: https://github.com/apache/lucenenet/pull/438#discussion_r593510872



##########
File path: src/Lucene.Net/Search/Similarities/TFIDFSimilarity.cs
##########
@@ -751,7 +751,7 @@ public override void Normalize(float queryNorm, float topLevelBoost)
         private Explanation ExplainScore(int doc, Explanation freq, IDFStats stats, NumericDocValues norms)
         {
             Explanation result = new Explanation();
-            result.Description = "score(doc=" + doc + ",freq=" + freq + "), product of:";
+            result.Description = "score(doc=" + doc + ",freq=" + freq.Value + "), product of:";

Review comment:
       Instead of 5.x it'd be good if the comment said 5.0 since that was the specific version this where this change was introduced and it was a bit hard to track down given that area of the code was substantially refactored in 5.2.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] rclabo commented on pull request #438: Don't insert extra newline in TFIDFSim's score explanation

Posted by GitBox <gi...@apache.org>.
rclabo commented on pull request #438:
URL: https://github.com/apache/lucenenet/pull/438#issuecomment-796919577


   While I'd still love to know why the two branch_5x links have different code for the file (different subversions of 5?) I can now see that the PR does simply apply a small change that existed in [version 5.0 of java lucene](https://github.com/apache/lucene/blob/releases/lucene-solr%2F5.0.0/lucene/core/src/java/org/apache/lucene/search/similarities/TFIDFSimilarity.java#L770)  prior to the larger rework of the ExplainScore method that happened in [5.2](https://github.com/apache/lucene/blob/releases/lucene-solr%2F5.2.0/lucene/core/src/java/org/apache/lucene/search/similarities/TFIDFSimilarity.java#L770)  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] rclabo commented on a change in pull request #438: Don't insert extra newline in TFIDFSim's score explanation

Posted by GitBox <gi...@apache.org>.
rclabo commented on a change in pull request #438:
URL: https://github.com/apache/lucenenet/pull/438#discussion_r593510872



##########
File path: src/Lucene.Net/Search/Similarities/TFIDFSimilarity.cs
##########
@@ -751,7 +751,7 @@ public override void Normalize(float queryNorm, float topLevelBoost)
         private Explanation ExplainScore(int doc, Explanation freq, IDFStats stats, NumericDocValues norms)
         {
             Explanation result = new Explanation();
-            result.Description = "score(doc=" + doc + ",freq=" + freq + "), product of:";
+            result.Description = "score(doc=" + doc + ",freq=" + freq.Value + "), product of:";

Review comment:
       Instead of 5.x it'd be good if the comment said 5.0 since that was the specific version where this change was introduced and it was a bit hard to track down given that area of the code was substantially refactored in 5.2.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] NightOwl888 commented on pull request #438: Don't insert extra newline in TFIDFSim's score explanation

Posted by GitBox <gi...@apache.org>.
NightOwl888 commented on pull request #438:
URL: https://github.com/apache/lucenenet/pull/438#issuecomment-796781618


   @rauhs 
   
   This is a port of Lucene 4.8.0 (for the most part). Strictly speaking, we are supposed to be sticking with that version, but we have let a few patches slip in from later versions of Lucene because they would make it difficult to work with and are in places that are difficult or impossible to replace by end users.
   
   But this appears to be part of a larger design change to `TFIDFSimilarity` that occurred in Lucene 5.x, rather than an actual bug that needs to be fixed.
   
   Is there some reason you have for needing this change ahead of schedule?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] rclabo commented on pull request #438: Don't insert extra newline in TFIDFSim's score explanation

Posted by GitBox <gi...@apache.org>.
rclabo commented on pull request #438:
URL: https://github.com/apache/lucenenet/pull/438#issuecomment-796910538


   OK, so I'm confused.  
   The source from the following two locations are very different and I thought they should be the same.  Based on the first link, the change is just a small patch, based on the 2nd link it's part of a much bigger rework of the ExplainScore method.
   
   https://svn.apache.org/viewvc/lucene/dev/branches/branch_5x/lucene/core/src/java/org/apache/lucene/search/similarities/TFIDFSimilarity.java?view=markup&pathrev=1641847
   
   https://github.com/apache/lucene/blob/branch_5x/lucene/core/src/java/org/apache/lucene/search/similarities/TFIDFSimilarity.java
   
   Can anyone explain why these two versions of branch_5x are so different for this file?
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] NightOwl888 commented on pull request #438: Don't insert extra newline in TFIDFSim's score explanation

Posted by GitBox <gi...@apache.org>.
NightOwl888 commented on pull request #438:
URL: https://github.com/apache/lucenenet/pull/438#issuecomment-796919487


   @rclabo - That is a good question, and I wish I had a good answer :).
   
   The first link is for SVN and I suspect it is out of date as they are using Git now. Of course the SVN link says it was updated 6 years, 3 months ago and the Git link says it was updated Feb 2016. So that is at least a 2 year gap between these branches.
   
   Do note that branches may continually be patched, but tags are fixed points in time that you can tie to a specific release.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] NightOwl888 commented on a change in pull request #438: Don't insert extra newline in TFIDFSim's score explanation

Posted by GitBox <gi...@apache.org>.
NightOwl888 commented on a change in pull request #438:
URL: https://github.com/apache/lucenenet/pull/438#discussion_r593318298



##########
File path: src/Lucene.Net/Search/Similarities/TFIDFSimilarity.cs
##########
@@ -751,7 +751,7 @@ public override void Normalize(float queryNorm, float topLevelBoost)
         private Explanation ExplainScore(int doc, Explanation freq, IDFStats stats, NumericDocValues norms)
         {
             Explanation result = new Explanation();
-            result.Description = "score(doc=" + doc + ",freq=" + freq + "), product of:";
+            result.Description = "score(doc=" + doc + ",freq=" + freq.Value + "), product of:";

Review comment:
       Please add the following comment just before this line:
   
   ```
   // LUCENENET specific - using freq.Value is a change that was made in Lucene 5.x, but is included
   // in 4.8.0 to remove annoying newlines from the output.
   // See: https://github.com/apache/lucene-solr/commit/f0bfcbc7d8fbc5bb2791da60af559e8b0ad6eed6
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] rauhs commented on pull request #438: Don't insert extra newline in TFIDFSim's score explanation

Posted by GitBox <gi...@apache.org>.
rauhs commented on pull request #438:
URL: https://github.com/apache/lucenenet/pull/438#issuecomment-796879882


   First: Thanks for the quick feedback!
   
   There is no real bug. I have just noticed the annoying newline in my explanations for a few months of working with 4.8 lucene and today was the day that it has bothered me too much so I went into the git repos and realized it was fixed in the Java version. (See my link).
   
   So no, there is no real reason other than getting rid of this annoying newline everywhere which makes reading the explanation a bit easier.
   
   > But this appears to be part of a larger design change to TFIDFSimilarity that occurred in Lucene 5.x, rather than an actual bug that needs to be fixed.
   
   I don't think this change was part of that rework. Here it was just about the newline.
   
   I'm fine with a rejection here, no big deal to me. I can live with the newline. Whatever you prefer. Cheers


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org