You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by ji...@apache.org on 2019/02/01 10:39:02 UTC

[lucene-solr] branch branch_8_0 updated: LUCENE-8676: The Korean tokenizer does not update the last position if the backtrace is caused by a big buffer (1024 chars).

This is an automated email from the ASF dual-hosted git repository.

jimczi pushed a commit to branch branch_8_0
in repository https://gitbox.apache.org/repos/asf/lucene-solr.git


The following commit(s) were added to refs/heads/branch_8_0 by this push:
     new bae3e24  LUCENE-8676: The Korean tokenizer does not update the last position if the backtrace is caused by a big buffer (1024 chars).
bae3e24 is described below

commit bae3e24e8bcdac9a07d2b0592cba72bed2e5365e
Author: jimczi <ji...@apache.org>
AuthorDate: Fri Feb 1 11:37:16 2019 +0100

    LUCENE-8676: The Korean tokenizer does not update the last position if the backtrace is caused by a big buffer (1024 chars).
---
 lucene/CHANGES.txt                                                     | 3 +++
 .../nori/src/java/org/apache/lucene/analysis/ko/KoreanTokenizer.java   | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/lucene/CHANGES.txt b/lucene/CHANGES.txt
index bd11e62..11c074e 100644
--- a/lucene/CHANGES.txt
+++ b/lucene/CHANGES.txt
@@ -293,6 +293,9 @@ Bug fixes:
   was not propagating final position increments from its child streams correctly.
   (Dan Meehl, Alan Woodward)
 
+* LUCENE-8676: The Korean tokenizer does not update the last position if the backtrace is caused
+  by a big buffer (1024 chars). (Jim Ferenczi)
+
 New Features
 
 * LUCENE-8026: ExitableDirectoryReader may now time out queries that run on
diff --git a/lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/KoreanTokenizer.java b/lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/KoreanTokenizer.java
index 012352c..8875fd0 100644
--- a/lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/KoreanTokenizer.java
+++ b/lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/KoreanTokenizer.java
@@ -535,7 +535,6 @@ public final class KoreanTokenizer extends Tokenizer {
       }
 
       if (pos > lastBackTracePos && posData.count == 1 && isFrontier) {
-        //  if (pos > lastBackTracePos && posData.count == 1 && isFrontier) {
         // We are at a "frontier", and only one node is
         // alive, so whatever the eventual best path is must
         // come through this node.  So we can safely commit
@@ -618,6 +617,7 @@ public final class KoreanTokenizer extends Tokenizer {
         } else {
           // This means the backtrace only produced
           // punctuation tokens, so we must keep parsing.
+          continue;
         }
       }