You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by rm...@apache.org on 2012/02/09 23:17:45 UTC
svn commit: r1242557 - in /lucene/dev/trunk:
modules/analysis/kuromoji/src/resources/org/apache/lucene/analysis/kuromoji/stopwords.txt
solr/example/solr/conf/lang/stopwords_ja.txt
Author: rmuir
Date: Thu Feb 9 22:17:44 2012
New Revision: 1242557
URL: http://svn.apache.org/viewvc?rev=1242557&view=rev
Log:
SOLR-3115: improve japanese stopwords.txt description
Modified:
lucene/dev/trunk/modules/analysis/kuromoji/src/resources/org/apache/lucene/analysis/kuromoji/stopwords.txt
lucene/dev/trunk/solr/example/solr/conf/lang/stopwords_ja.txt
Modified: lucene/dev/trunk/modules/analysis/kuromoji/src/resources/org/apache/lucene/analysis/kuromoji/stopwords.txt
URL: http://svn.apache.org/viewvc/lucene/dev/trunk/modules/analysis/kuromoji/src/resources/org/apache/lucene/analysis/kuromoji/stopwords.txt?rev=1242557&r1=1242556&r2=1242557&view=diff
==============================================================================
--- lucene/dev/trunk/modules/analysis/kuromoji/src/resources/org/apache/lucene/analysis/kuromoji/stopwords.txt (original)
+++ lucene/dev/trunk/modules/analysis/kuromoji/src/resources/org/apache/lucene/analysis/kuromoji/stopwords.txt Thu Feb 9 22:17:44 2012
@@ -1,14 +1,19 @@
#
# This file defines a stopword set for Japanese.
#
-# The set is made up hand-picked frequent terms from taken from segmented Japanese
-# Wikipedia. Punctuation characters and frequent kanji have mostly been left out.
+# This set is made up of hand-picked frequent terms from segmented Japanese Wikipedia.
+# Punctuation characters and frequent kanji have mostly been left out. See LUCENE-3745
+# for frequency lists, etc. that can be useful for making your own set (if desired)
#
-# There is an overlap between these stopwords and the terms removed when used in
-# combination with the KuromojiPartOfSpeechStopFilter. When editing this file, note
+# Note that there is an overlap between these stopwords and the terms stopped when used
+# in combination with the KuromojiPartOfSpeechStopFilter. When editing this file, note
# that comments are not allowed on the same line as stopwords.
#
-# See LUCENE-3745 for frequency lists, etc. that can be useful for making your own set.
+# Also note that stopping is done in a case-insensitive manner. Change your StopFilter
+# configuration if you need case-sensitive stopping. Lastly, note that stopping is done
+# using the same character width as the entries in this file. Since this StopFilter is
+# normally done after a CJKWidthFilter in your chain, you would usually want your romaji
+# entries to be in half-width and your kana entries to be in full-width.
#
ã®
ã«
Modified: lucene/dev/trunk/solr/example/solr/conf/lang/stopwords_ja.txt
URL: http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/conf/lang/stopwords_ja.txt?rev=1242557&r1=1242556&r2=1242557&view=diff
==============================================================================
--- lucene/dev/trunk/solr/example/solr/conf/lang/stopwords_ja.txt (original)
+++ lucene/dev/trunk/solr/example/solr/conf/lang/stopwords_ja.txt Thu Feb 9 22:17:44 2012
@@ -1,14 +1,19 @@
#
# This file defines a stopword set for Japanese.
#
-# The set is made up hand-picked frequent terms from taken from segmented Japanese
-# Wikipedia. Punctuation characters and frequent kanji have mostly been left out.
+# This set is made up of hand-picked frequent terms from segmented Japanese Wikipedia.
+# Punctuation characters and frequent kanji have mostly been left out. See LUCENE-3745
+# for frequency lists, etc. that can be useful for making your own set (if desired)
#
-# There is an overlap between these stopwords and the terms removed when used in
-# combination with the KuromojiPartOfSpeechStopFilter. When editing this file, note
+# Note that there is an overlap between these stopwords and the terms stopped when used
+# in combination with the KuromojiPartOfSpeechStopFilter. When editing this file, note
# that comments are not allowed on the same line as stopwords.
#
-# See LUCENE-3745 for frequency lists, etc. that can be useful for making your own set.
+# Also note that stopping is done in a case-insensitive manner. Change your StopFilter
+# configuration if you need case-sensitive stopping. Lastly, note that stopping is done
+# using the same character width as the entries in this file. Since this StopFilter is
+# normally done after a CJKWidthFilter in your chain, you would usually want your romaji
+# entries to be in half-width and your kana entries to be in full-width.
#
ã®
ã«