You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by rm...@apache.org on 2020/02/16 05:36:28 UTC
[lucene-solr] branch jira/LUCENE-9220 updated: LUCENE-9220:
regenerate all snowball stopfiles
This is an automated email from the ASF dual-hosted git repository.
rmuir pushed a commit to branch jira/LUCENE-9220
in repository https://gitbox.apache.org/repos/asf/lucene-solr.git
The following commit(s) were added to refs/heads/jira/LUCENE-9220 by this push:
new 03aebec LUCENE-9220: regenerate all snowball stopfiles
03aebec is described below
commit 03aebecf98acab31c608dbcdbf8b5c038c3c02f7
Author: Robert Muir <rm...@apache.org>
AuthorDate: Sun Feb 16 00:36:15 2020 -0500
LUCENE-9220: regenerate all snowball stopfiles
---
.../lucene/analysis/snowball/danish_stop.txt | 8 +-
.../apache/lucene/analysis/snowball/dutch_stop.txt | 8 +-
.../lucene/analysis/snowball/english_stop.txt | 9 +-
.../lucene/analysis/snowball/finnish_stop.txt | 13 ++-
.../lucene/analysis/snowball/french_stop.txt | 10 +--
.../lucene/analysis/snowball/german_stop.txt | 6 +-
.../lucene/analysis/snowball/hungarian_stop.txt | 8 +-
.../lucene/analysis/snowball/indonesian_stop.txt | 99 ++++++++++++++++++++++
.../lucene/analysis/snowball/italian_stop.txt | 6 +-
.../lucene/analysis/snowball/norwegian_stop.txt | 12 +--
.../lucene/analysis/snowball/portuguese_stop.txt | 6 +-
.../lucene/analysis/snowball/russian_stop.txt | 7 +-
.../lucene/analysis/snowball/spanish_stop.txt | 6 +-
.../lucene/analysis/snowball/swedish_stop.txt | 6 +-
14 files changed, 151 insertions(+), 53 deletions(-)
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/danish_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/danish_stop.txt
index 42e6145..6e90e8f 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/danish_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/danish_stop.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/danish/stop.txt
+ | From https://snowballstem.org/algorithms/danish/stop.txt
| This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
@@ -60,7 +60,7 @@ hvor | where
eller | or
hvad | what
skal | must/shall etc.
-selv | myself/youself/herself/ourselves etc., even
+selv | myself/yourself/herself/ourselves etc., even
her | here
alle | all/everyone/everybody etc.
vil | will (verb)
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/dutch_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/dutch_stop.txt
index 47a2aea..48c5515 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/dutch_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/dutch_stop.txt
@@ -1,12 +1,13 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/dutch/stop.txt
+ | From https://snowballstem.org/algorithms/dutch/stop.txt
| This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
| NOTE: To use this file with StopFilterFactory, you must specify format="snowball"
+
| A Dutch stop word list. Comments begin with vertical bar. Each stop
| word is at the start of a line.
@@ -117,3 +118,4 @@ uw | your
iemand | somebody
geweest | been; past participle of 'be'
andere | other
+
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/english_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/english_stop.txt
index 0385841..00902dc 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/english_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/english_stop.txt
@@ -1,12 +1,12 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/english/stop.txt
+ | From https://snowballstem.org/algorithms/english/stop.txt
| This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
| NOTE: To use this file with StopFilterFactory, you must specify format="snowball"
-
+
| An English stop word list. Comments begin with vertical bar. Each stop
| word is at the start of a line.
@@ -317,3 +317,4 @@ very
| old
| high
| long
+
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/finnish_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/finnish_stop.txt
index 4372c9a..c9ee2f1 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/finnish_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/finnish_stop.txt
@@ -1,12 +1,12 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/finnish/stop.txt
+ | From https://snowballstem.org/algorithms/finnish/stop.txt
| This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
| NOTE: To use this file with StopFilterFactory, you must specify format="snowball"
-
+
| forms of BE
olla
@@ -48,8 +48,8 @@ me meidän meidät meitä meissä meistä meihin meillä meiltä meille
te teidän teidät teitä teissä teistä teihin teillä teiltä teille | you
he heidän heidät heitä heissä heistä heihin heillä heiltä heille | they
-tämä tämän tätä tässä tästä tähän tallä tältä tälle tänä täksi | this
-tuo tuon tuotä tuossa tuosta tuohon tuolla tuolta tuolle tuona tuoksi | that
+tämä tämän tätä tässä tästä tähän tällä tältä tälle tänä täksi | this
+tuo tuon tuota tuossa tuosta tuohon tuolla tuolta tuolle tuona tuoksi | that
se sen sitä siinä siitä siihen sillä siltä sille sinä siksi | it
nämä näiden näitä näissä näistä näihin näillä näiltä näille näinä näiksi | these
nuo noiden noita noissa noista noihin noilla noilta noille noina noiksi | those
@@ -91,7 +91,6 @@ yli | over, across
| other
kun | when
-niin | so
nyt | now
itse | self
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/french_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/french_stop.txt
index 749abae..8fec2c9 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/french_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/french_stop.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/french/stop.txt
+ | From https://snowballstem.org/algorithms/french/stop.txt
| This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
@@ -169,8 +169,8 @@ eussent
| Later additions (from Jean-Christophe Deschamps)
ceci | this
-cela | that
-celà | that
+cela | that (added 11 Apr 2012. Omission reported by Adrien Grand)
+celà | that (incorrect, though common)
cet | this
cette | this
ici | here
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/german_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/german_stop.txt
index 86525e7..804bbbd 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/german_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/german_stop.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/german/stop.txt
+ | From https://snowballstem.org/algorithms/german/stop.txt
| This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/hungarian_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/hungarian_stop.txt
index 37526da..3fa279e 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/hungarian_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/hungarian_stop.txt
@@ -1,12 +1,12 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/hungarian/stop.txt
+ | From https://snowballstem.org/algorithms/hungarian/stop.txt
| This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
| NOTE: To use this file with StopFilterFactory, you must specify format="snowball"
-
+
| Hungarian stop word list
| prepared by Anna Tordai
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/indonesian_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/indonesian_stop.txt
new file mode 100644
index 0000000..225560b
--- /dev/null
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/indonesian_stop.txt
@@ -0,0 +1,99 @@
+ | From https://snowballstem.org/algorithms/indonesian/stop.txt
+ | This file is distributed under the BSD License.
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
+ | - Encoding was converted to UTF-8.
+ | - This notice was added.
+ |
+ | NOTE: To use this file with StopFilterFactory, you must specify format="snowball"
+yang | that
+dan | and
+di | in
+dari | from
+ini | this
+pada kepada | at, to [person]
+ada adalah | there is, is
+dengan | with
+untuk | for
+dalam | in the
+oleh | by
+sebagai | as
+juga | also, too
+ke | to
+atau | or
+tidak | not
+itu | that
+sebuah | a
+tersebut | the
+dapat | can, may
+ia | he/she, yes
+telah | already
+satu | one
+memiliki | have
+mereka | they
+bahwa | that
+lebih | more, more than
+karena | because, since
+seorang | one person, same
+akan | will, about to
+seperti | as, like
+secara | on
+kemudian | later, then
+beberapa | some
+banyak | many
+antara | between
+setelah | after
+yaitu | that is
+hanya | only
+hingga | to
+serta | along with
+sama | same, and
+dia | he/she/it (informal)
+tetapi | but
+namun | however
+melalui | through
+bisa | can
+sehingga | so
+ketika | when
+suatu | a
+sendiri | own (adverb)
+bagi | for
+semua | all
+harus | must
+setiap | each, every
+maka | then
+maupun | as well
+tanpa | without
+saja | only
+jika | if
+bukan | not
+belum | not yet
+sedangkan | while
+yakni | i.e.
+meskipun | although
+hampir | almost
+kita | we/us (inclusive)
+demikian | thereby
+daripada | from/than/instead of
+apa | what/which/or/eh
+ialah | is
+sana | there
+begitu | so
+seseorang | someone
+selain | besides
+terlalu | too
+ataupun | or
+saya | me/I (formal)
+bila | if/when
+bagaimana | how
+tapi | but
+apabila | when/if
+kalau | if
+kami | we/us (exclusive)
+melainkan | but (rather)
+boleh | may,can
+aku | I/me (informal)
+anda | you (formal)
+kamu | you (informal)
+beliau | he/she/it (formal)
+kalian | you (plural)
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/italian_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/italian_stop.txt
index 1219cc7..c74160e 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/italian_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/italian_stop.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/italian/stop.txt
+ | From https://snowballstem.org/algorithms/italian/stop.txt
| This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/norwegian_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/norwegian_stop.txt
index a7a2c28..f427609 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/norwegian_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/norwegian_stop.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/norwegian/stop.txt
+ | From https://snowballstem.org/algorithms/norwegian/stop.txt
| This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
@@ -25,7 +25,7 @@ et | a/an
den | it/this/that
til | to
er | is/am/are
-som | who/that
+som | who/which/that
på | on
de | they / you(formal)
med | with
@@ -84,7 +84,6 @@ noen | some
noe | some
ville | would
dere | you
-som | who/which/that
deres | their/theirs
kun | only/just
ja | yes
@@ -129,7 +128,6 @@ mange | many
også | also
slik | just
vært | been
-være | to be
båe | both *
begge | both
siden | since
@@ -155,7 +153,6 @@ hennar | her/hers
hennes | hers
hoss | how *
hossen | how *
-ikkje | not *
ingi | noone *
inkje | noone *
korleis | how *
@@ -177,7 +174,6 @@ noka | some (fem.) *
nokor | some *
noko | some *
nokre | some *
-si | his/hers *
sia | since *
sidan | since *
so | so *
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/portuguese_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/portuguese_stop.txt
index acfeb01..d03d7f2 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/portuguese_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/portuguese_stop.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/portuguese/stop.txt
+ | From https://snowballstem.org/algorithms/portuguese/stop.txt
| This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/russian_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/russian_stop.txt
index 5527140..65512d4 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/russian_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/russian_stop.txt
@@ -1,12 +1,13 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/russian/stop.txt
+ | From https://snowballstem.org/algorithms/russian/stop.txt
| This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
| NOTE: To use this file with StopFilterFactory, you must specify format="snowball"
+
| a russian stop word list. comments begin with vertical bar. each stop
| word is at the start of a line.
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/spanish_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/spanish_stop.txt
index 487d78c..48bd65e 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/spanish_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/spanish_stop.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/spanish/stop.txt
+ | From https://snowballstem.org/algorithms/spanish/stop.txt
| This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/swedish_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/swedish_stop.txt
index 096f87f..a5f056b 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/swedish_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/swedish_stop.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/swedish/stop.txt
+ | From https://snowballstem.org/algorithms/swedish/stop.txt
| This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
|