You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@doris.apache.org by ji...@apache.org on 2023/06/21 03:06:49 UTC

[doris-thirdparty] branch clucene updated: [fix](jiebafix ) cut word greater than 225 heap-use-after-free (#96)

This is an automated email from the ASF dual-hosted git repository.

jianliangqi pushed a commit to branch clucene
in repository https://gitbox.apache.org/repos/asf/doris-thirdparty.git


The following commit(s) were added to refs/heads/clucene by this push:
     new 103e88a8 [fix](jiebafix ) cut word greater than 225 heap-use-after-free (#96)
103e88a8 is described below

commit 103e88a8a3b24da9ae2a0d9908a3ceb3f7808a61
Author: zzzxl <33...@users.noreply.github.com>
AuthorDate: Wed Jun 21 11:06:44 2023 +0800

    [fix](jiebafix ) cut word greater than 225 heap-use-after-free (#96)
---
 src/contribs-lib/CLucene/analysis/jieba/ChineseTokenizer.cpp | 2 +-
 src/contribs-lib/CLucene/analysis/jieba/ChineseTokenizer.h   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/contribs-lib/CLucene/analysis/jieba/ChineseTokenizer.cpp b/src/contribs-lib/CLucene/analysis/jieba/ChineseTokenizer.cpp
index bf4ea1db..e6acd64f 100644
--- a/src/contribs-lib/CLucene/analysis/jieba/ChineseTokenizer.cpp
+++ b/src/contribs-lib/CLucene/analysis/jieba/ChineseTokenizer.cpp
@@ -8,7 +8,7 @@ CL_NS_USE(analysis)
 CL_NS_USE(util)
 
 ChineseTokenizer::ChineseTokenizer(lucene::util::Reader *reader, AnalyzerMode m) : Tokenizer(reader), mode(m) {
-    buffer[0] = 0;
+    memset(buffer, 0, LUCENE_MAX_WORD_LEN + 1);
 }
 
 void ChineseTokenizer::init(const std::string &dictPath) {
diff --git a/src/contribs-lib/CLucene/analysis/jieba/ChineseTokenizer.h b/src/contribs-lib/CLucene/analysis/jieba/ChineseTokenizer.h
index d94e9b1c..48de52b1 100644
--- a/src/contribs-lib/CLucene/analysis/jieba/ChineseTokenizer.h
+++ b/src/contribs-lib/CLucene/analysis/jieba/ChineseTokenizer.h
@@ -44,7 +44,7 @@ private:
      * character buffer, store the characters which are used to compose <br>
      * the returned Token
      */
-    TCHAR buffer[LUCENE_MAX_WORD_LEN]{};
+    TCHAR buffer[LUCENE_MAX_WORD_LEN + 1]{};
 
     /**
      * I/O buffer, used to store the content of the input(one of the <br>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org