You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by ji...@apache.org on 2019/02/01 10:39:02 UTC
[lucene-solr] branch branch_8_0 updated: LUCENE-8676: The Korean
tokenizer does not update the last position if the backtrace is caused by a
big buffer (1024 chars).
This is an automated email from the ASF dual-hosted git repository.
jimczi pushed a commit to branch branch_8_0
in repository https://gitbox.apache.org/repos/asf/lucene-solr.git
The following commit(s) were added to refs/heads/branch_8_0 by this push:
new bae3e24 LUCENE-8676: The Korean tokenizer does not update the last position if the backtrace is caused by a big buffer (1024 chars).
bae3e24 is described below
commit bae3e24e8bcdac9a07d2b0592cba72bed2e5365e
Author: jimczi <ji...@apache.org>
AuthorDate: Fri Feb 1 11:37:16 2019 +0100
LUCENE-8676: The Korean tokenizer does not update the last position if the backtrace is caused by a big buffer (1024 chars).
---
lucene/CHANGES.txt | 3 +++
.../nori/src/java/org/apache/lucene/analysis/ko/KoreanTokenizer.java | 2 +-
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/lucene/CHANGES.txt b/lucene/CHANGES.txt
index bd11e62..11c074e 100644
--- a/lucene/CHANGES.txt
+++ b/lucene/CHANGES.txt
@@ -293,6 +293,9 @@ Bug fixes:
was not propagating final position increments from its child streams correctly.
(Dan Meehl, Alan Woodward)
+* LUCENE-8676: The Korean tokenizer does not update the last position if the backtrace is caused
+ by a big buffer (1024 chars). (Jim Ferenczi)
+
New Features
* LUCENE-8026: ExitableDirectoryReader may now time out queries that run on
diff --git a/lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/KoreanTokenizer.java b/lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/KoreanTokenizer.java
index 012352c..8875fd0 100644
--- a/lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/KoreanTokenizer.java
+++ b/lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/KoreanTokenizer.java
@@ -535,7 +535,6 @@ public final class KoreanTokenizer extends Tokenizer {
}
if (pos > lastBackTracePos && posData.count == 1 && isFrontier) {
- // if (pos > lastBackTracePos && posData.count == 1 && isFrontier) {
// We are at a "frontier", and only one node is
// alive, so whatever the eventual best path is must
// come through this node. So we can safely commit
@@ -618,6 +617,7 @@ public final class KoreanTokenizer extends Tokenizer {
} else {
// This means the backtrace only produced
// punctuation tokens, so we must keep parsing.
+ continue;
}
}