You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/09/14 14:40:31 UTC

[GitHub] [lucene] kotman12 opened a new issue, #11771: KeywordRepeatFilter + OpenNLPLLemmatizer Early Exit

kotman12 opened a new issue, #11771:
URL: https://github.com/apache/lucene/issues/11771

   ### Description
   
   KeywordRepeatFilter + OpenNLPLLemmatizer leads to arbitrarily early exit of token stream.
   
   Steps to reproduce: run this [test](https://github.com/kotman12/lucene/blob/illustrate-bug/lucene/analysis/opennlp/src/test/org/apache/lucene/analysis/opennlp/TestOpenNLPLemmatizerFilterFactory.java#L324) and notice how no text below [this line from the test file](https://github.com/kotman12/lucene/blob/illustrate-bug/lucene/analysis/opennlp/src/test-files/org/apache/lucene/analysis/opennlp/data/early-exit-bug-input.txt#L20) gets analyzed.
   
   The root cause appears to be [an extraneous exit condition](https://github.com/kotman12/lucene/blob/illustrate-bug/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/OpenNLPLemmatizerFilter.java#L75) that doesn't play nicely with KeywordRepeatFilter.
   
   This is related to the bug #11735 and is addressed by #11734 
   
   ### Version and environment details
   
   latest version of lucene running jdk-17


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] kotman12 commented on issue #11771: KeywordRepeatFilter + OpenNLPLemmatizer Early Exit

Posted by GitBox <gi...@apache.org>.
kotman12 commented on issue #11771:
URL: https://github.com/apache/lucene/issues/11771#issuecomment-1256604231

   Very, very interesting .. will take a look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dweiss commented on issue #11771: KeywordRepeatFilter + OpenNLPLemmatizer Early Exit

Posted by GitBox <gi...@apache.org>.
dweiss commented on issue #11771:
URL: https://github.com/apache/lucene/issues/11771#issuecomment-1256534557

   I can reproduce those failures with JDK11 but not with JDK17. I didn't look into this deeper.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dweiss closed issue #11771: KeywordRepeatFilter + OpenNLPLemmatizer Early Exit

Posted by GitBox <gi...@apache.org>.
dweiss closed issue #11771: KeywordRepeatFilter + OpenNLPLemmatizer Early Exit
URL: https://github.com/apache/lucene/issues/11771


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dweiss commented on issue #11771: KeywordRepeatFilter + OpenNLPLemmatizer Early Exit

Posted by GitBox <gi...@apache.org>.
dweiss commented on issue #11771:
URL: https://github.com/apache/lucene/issues/11771#issuecomment-1256892005

   If this code went in the main branch then it's also a bug there. Comparing strings by reference is a no-no - I should have caught it earlier. I'll do the update on both branches later today.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dweiss commented on issue #11771: KeywordRepeatFilter + OpenNLPLemmatizer Early Exit

Posted by GitBox <gi...@apache.org>.
dweiss commented on issue #11771:
URL: https://github.com/apache/lucene/issues/11771#issuecomment-1256525265

   https://ci-builds.apache.org/job/Lucene/job/Lucene-Check-9.x/3057/
   
   Hmm... this patch applied to 9x fails the tests. Could you take a look at that, @kotman12 ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dweiss closed issue #11771: KeywordRepeatFilter + OpenNLPLemmatizer Early Exit

Posted by GitBox <gi...@apache.org>.
dweiss closed issue #11771: KeywordRepeatFilter + OpenNLPLemmatizer Early Exit
URL: https://github.com/apache/lucene/issues/11771


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] kotman12 commented on issue #11771: KeywordRepeatFilter + OpenNLPLemmatizer Early Exit

Posted by GitBox <gi...@apache.org>.
kotman12 commented on issue #11771:
URL: https://github.com/apache/lucene/issues/11771#issuecomment-1256641710

   So [this change](https://github.com/apache/lucene/pull/11810/files) seems to fix the test **locally** for me in branch 9x .. Created a PR for the upstream .. not sure how you want to handle the reversion in 9X branch


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org