You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/08/20 01:08:01 UTC

[GitHub] [lucene] dungba88 opened a new pull request #254: LUCENE-10059: Fix AssertionError in JapaneseTokenizer backtrace

dungba88 opened a new pull request #254:
URL: https://github.com/apache/lucene/pull/254


   # Description
   
   There is an issue which causes an `AssertionError` in the backtrace step of `JapaneseTokenizer`. If there is a text span of length 1024 (determined by `MAX_BACKTRACE_GAP`) where the regular backtrace is not called, a forced backtrace will be applied. If the partially best path at this point happens to end at the last pos, and since there is always a final backtrace applied at the end, the final backtrace will try to backtrace from and to the same position, causing an AssertionError in `RollingCharBuffer.get()` when it tries to generate an empty buffer.
   
   # Solution
   
   Since the `backtrace()` method is essentially no-op when the from and to pos are the same, we can skip it by returning early.
   
   ```
       if (endPos == lastBackTracePos) {
         return;
       }
   ```
   
   # Tests
   
   New test (`testEmptyBacktrace`) is added to reproduce the issue and confirm the fix. This test creates an input of 1025 length, where the first 1023 characters generate multiple path to ensure the regular backtrace won't be called. The last 2 characters is a valid word in dictionary to ensure the forced backtrace will end at the last pos. 
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request title.
   - [x] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `main` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dungba88 commented on pull request #254: LUCENE-10059: Fix AssertionError in JapaneseTokenizer backtrace

Posted by GitBox <gi...@apache.org>.
dungba88 commented on pull request #254:
URL: https://github.com/apache/lucene/pull/254#issuecomment-902748267


   I've opened a backport PR for 8.x: https://github.com/apache/lucene-solr/pull/2557


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mikemccand commented on pull request #254: LUCENE-10059: Fix AssertionError in JapaneseTokenizer backtrace

Posted by GitBox <gi...@apache.org>.
mikemccand commented on pull request #254:
URL: https://github.com/apache/lucene/pull/254#issuecomment-902651596


   In the mean time, @dungba88 could you please also open a backport PR for Lucene/Solr 8.x?  It is a different git repository (https://github.com/apache/lucene-solr/tree/branch_8x), but if you do some git remote magic you can cherry-pick this commit over.  Or just diff/apply patch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dungba88 commented on pull request #254: LUCENE-10059: Fix AssertionError in JapaneseTokenizer backtrace

Posted by GitBox <gi...@apache.org>.
dungba88 commented on pull request #254:
URL: https://github.com/apache/lucene/pull/254#issuecomment-902731868


   @mikemccand Thanks for reviewing and merging! I'll prepare a backport PR for version 8.x


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mikemccand merged pull request #254: LUCENE-10059: Fix AssertionError in JapaneseTokenizer backtrace

Posted by GitBox <gi...@apache.org>.
mikemccand merged pull request #254:
URL: https://github.com/apache/lucene/pull/254


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org