You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2019/12/11 14:46:44 UTC

[GitHub] [lucene-solr] cbuescher commented on a change in pull request #1073: LUCENE-9088: JapaneseNumberFilter uses inaccurate PartOfSpeechAttribute

cbuescher commented on a change in pull request #1073: LUCENE-9088: JapaneseNumberFilter uses inaccurate PartOfSpeechAttribute
URL: https://github.com/apache/lucene-solr/pull/1073#discussion_r356639454
 
 

 ##########
 File path: lucene/analysis/kuromoji/src/java/org/apache/lucene/analysis/ja/JapaneseNumberFilter.java
 ##########
 @@ -218,6 +228,11 @@ public final boolean incrementToken() throws IOException {
         // capture the state of this token and emit it on our next incrementToken()
         state = captureState();
       }
+      // we restore state to when we read the last numeral token to get its attributes (e.g. part-of-speech)
+      if (lastNumeralTokenState != null) {
+        restoreState(lastNumeralTokenState);
 
 Review comment:
   Note: simply setting the PartOfSpeechAttribute to "noun-numeric" on the emited token wasn't as straight forward as I expected, since the implementation wraps a whole `org.apache.lucene.analysis.ja.Token`. This is why I explored tracking and restoring the last "good" tokens state here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org