You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "JerryChin (via GitHub)" <gi...@apache.org> on 2023/05/16 16:16:16 UTC
[GitHub] [lucene] JerryChin opened a new pull request, #12299: GITHUB-12291: Skip blank lines from stopwords list.
JerryChin opened a new pull request, #12299:
URL: https://github.com/apache/lucene/pull/12299
### Description
Hi team,
This PR fixes #12291, it will skip any blank lines when loading stopwords with `WordlistLoader#getWordSet`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12299: GITHUB-12291: Skip blank lines from stopwords list.
Posted by "uschindler (via GitHub)" <gi...@apache.org>.
uschindler commented on PR #12299:
URL: https://github.com/apache/lucene/pull/12299#issuecomment-1552064934
Looks fine. I will merge this, but please add a CHANGES.txt entry in the 9.7 section.
Thanks for taking care of the issue! 👍
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] matthias-mueller commented on a diff in pull request #12299: GITHUB-12291: Skip blank lines from stopwords list.
Posted by "matthias-mueller (via GitHub)" <gi...@apache.org>.
matthias-mueller commented on code in PR #12299:
URL: https://github.com/apache/lucene/pull/12299#discussion_r1196179445
##########
lucene/core/src/java/org/apache/lucene/analysis/WordlistLoader.java:
##########
@@ -53,7 +53,10 @@ public static CharArraySet getWordSet(Reader reader, CharArraySet result) throws
try (BufferedReader br = getBufferedReader(reader)) {
String word = null;
while ((word = br.readLine()) != null) {
- result.add(word.trim());
+ word = word.trim();
+ // skip blank lines
+ if (word.length() == 0) continue;
Review Comment:
Should it better be `word.strip()`? https://stackoverflow.com/a/51266583
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] JerryChin commented on pull request #12299: GITHUB-12291: Skip blank lines from stopwords list.
Posted by "JerryChin (via GitHub)" <gi...@apache.org>.
JerryChin commented on PR #12299:
URL: https://github.com/apache/lucene/pull/12299#issuecomment-1552327703
Hi @uschindler, which category should I put it under? how about `Improvements`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] matthias-mueller commented on a diff in pull request #12299: GITHUB-12291: Skip blank lines from stopwords list.
Posted by "matthias-mueller (via GitHub)" <gi...@apache.org>.
matthias-mueller commented on code in PR #12299:
URL: https://github.com/apache/lucene/pull/12299#discussion_r1196380870
##########
lucene/core/src/java/org/apache/lucene/analysis/WordlistLoader.java:
##########
@@ -53,7 +53,10 @@ public static CharArraySet getWordSet(Reader reader, CharArraySet result) throws
try (BufferedReader br = getBufferedReader(reader)) {
String word = null;
while ((word = br.readLine()) != null) {
- result.add(word.trim());
+ word = word.trim();
+ // skip blank lines
+ if (word.length() == 0) continue;
Review Comment:
@uschindler then sorry for the noise - I was tiggered by `SmartCHINESEAnalyzer` and the lack of unicode whitespace support in `String.trim()`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12299: GITHUB-12291: Skip blank lines from stopwords list.
Posted by "uschindler (via GitHub)" <gi...@apache.org>.
uschindler commented on PR #12299:
URL: https://github.com/apache/lucene/pull/12299#issuecomment-1553192538
I will merge this to 9.x when back at home.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12299: GITHUB-12291: Skip blank lines from stopwords list.
Posted by "uschindler (via GitHub)" <gi...@apache.org>.
uschindler commented on code in PR #12299:
URL: https://github.com/apache/lucene/pull/12299#discussion_r1196362845
##########
lucene/core/src/java/org/apache/lucene/analysis/WordlistLoader.java:
##########
@@ -53,7 +53,10 @@ public static CharArraySet getWordSet(Reader reader, CharArraySet result) throws
try (BufferedReader br = getBufferedReader(reader)) {
String word = null;
while ((word = br.readLine()) != null) {
- result.add(word.trim());
+ word = word.trim();
+ // skip blank lines
+ if (word.length() == 0) continue;
Review Comment:
I don't want to change this here as this is unrelated.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12299: GITHUB-12291: Skip blank lines from stopwords list.
Posted by "uschindler (via GitHub)" <gi...@apache.org>.
uschindler commented on PR #12299:
URL: https://github.com/apache/lucene/pull/12299#issuecomment-1552603008
Isn't it a Bugfix? Because originally we had an empty Stopword in the set.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] uschindler merged pull request #12299: GITHUB-12291: Skip blank lines from stopwords list.
Posted by "uschindler (via GitHub)" <gi...@apache.org>.
uschindler merged PR #12299:
URL: https://github.com/apache/lucene/pull/12299
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12299: GITHUB-12291: Skip blank lines from stopwords list.
Posted by "uschindler (via GitHub)" <gi...@apache.org>.
uschindler commented on code in PR #12299:
URL: https://github.com/apache/lucene/pull/12299#discussion_r1196018225
##########
lucene/core/src/java/org/apache/lucene/analysis/WordlistLoader.java:
##########
@@ -117,7 +120,10 @@ public static CharArraySet getWordSet(Reader reader, String comment, CharArraySe
String word = null;
while ((word = br.readLine()) != null) {
if (word.startsWith(comment) == false) {
- result.add(word.trim());
+ word = word.trim();
+ // skip blank lines
+ if (word.length() == 0) continue;
Review Comment:
Use `word.isEmpty()`.
##########
lucene/core/src/java/org/apache/lucene/analysis/WordlistLoader.java:
##########
@@ -53,7 +53,10 @@ public static CharArraySet getWordSet(Reader reader, CharArraySet result) throws
try (BufferedReader br = getBufferedReader(reader)) {
String word = null;
while ((word = br.readLine()) != null) {
- result.add(word.trim());
+ word = word.trim();
+ // skip blank lines
+ if (word.length() == 0) continue;
Review Comment:
Use `word.isEmpty()`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org