You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by rm...@apache.org on 2020/02/16 05:36:28 UTC

[lucene-solr] branch jira/LUCENE-9220 updated: LUCENE-9220: regenerate all snowball stopfiles

This is an automated email from the ASF dual-hosted git repository.

rmuir pushed a commit to branch jira/LUCENE-9220
in repository https://gitbox.apache.org/repos/asf/lucene-solr.git


The following commit(s) were added to refs/heads/jira/LUCENE-9220 by this push:
     new 03aebec  LUCENE-9220: regenerate all snowball stopfiles
03aebec is described below

commit 03aebecf98acab31c608dbcdbf8b5c038c3c02f7
Author: Robert Muir <rm...@apache.org>
AuthorDate: Sun Feb 16 00:36:15 2020 -0500

    LUCENE-9220: regenerate all snowball stopfiles
---
 .../lucene/analysis/snowball/danish_stop.txt       |  8 +-
 .../apache/lucene/analysis/snowball/dutch_stop.txt |  8 +-
 .../lucene/analysis/snowball/english_stop.txt      |  9 +-
 .../lucene/analysis/snowball/finnish_stop.txt      | 13 ++-
 .../lucene/analysis/snowball/french_stop.txt       | 10 +--
 .../lucene/analysis/snowball/german_stop.txt       |  6 +-
 .../lucene/analysis/snowball/hungarian_stop.txt    |  8 +-
 .../lucene/analysis/snowball/indonesian_stop.txt   | 99 ++++++++++++++++++++++
 .../lucene/analysis/snowball/italian_stop.txt      |  6 +-
 .../lucene/analysis/snowball/norwegian_stop.txt    | 12 +--
 .../lucene/analysis/snowball/portuguese_stop.txt   |  6 +-
 .../lucene/analysis/snowball/russian_stop.txt      |  7 +-
 .../lucene/analysis/snowball/spanish_stop.txt      |  6 +-
 .../lucene/analysis/snowball/swedish_stop.txt      |  6 +-
 14 files changed, 151 insertions(+), 53 deletions(-)

diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/danish_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/danish_stop.txt
index 42e6145..6e90e8f 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/danish_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/danish_stop.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/danish/stop.txt
+ | From https://snowballstem.org/algorithms/danish/stop.txt
  | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
  |  - Encoding was converted to UTF-8.
  |  - This notice was added.
  |
@@ -60,7 +60,7 @@ hvor         | where
 eller        | or
 hvad         | what
 skal         | must/shall etc.
-selv         | myself/youself/herself/ourselves etc., even
+selv         | myself/yourself/herself/ourselves etc., even
 her          | here
 alle         | all/everyone/everybody etc.
 vil          | will (verb)
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/dutch_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/dutch_stop.txt
index 47a2aea..48c5515 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/dutch_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/dutch_stop.txt
@@ -1,12 +1,13 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/dutch/stop.txt
+ | From https://snowballstem.org/algorithms/dutch/stop.txt
  | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
  |  - Encoding was converted to UTF-8.
  |  - This notice was added.
  |
  | NOTE: To use this file with StopFilterFactory, you must specify format="snowball"
 
+
  | A Dutch stop word list. Comments begin with vertical bar. Each stop
  | word is at the start of a line.
 
@@ -117,3 +118,4 @@ uw             |  your
 iemand         |  somebody
 geweest        |  been; past participle of 'be'
 andere         |  other
+
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/english_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/english_stop.txt
index 0385841..00902dc 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/english_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/english_stop.txt
@@ -1,12 +1,12 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/english/stop.txt
+ | From https://snowballstem.org/algorithms/english/stop.txt
  | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
  |  - Encoding was converted to UTF-8.
  |  - This notice was added.
  |
  | NOTE: To use this file with StopFilterFactory, you must specify format="snowball"
- 
+
  | An English stop word list. Comments begin with vertical bar. Each stop
  | word is at the start of a line.
 
@@ -317,3 +317,4 @@ very
     | old
     | high
     | long
+
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/finnish_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/finnish_stop.txt
index 4372c9a..c9ee2f1 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/finnish_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/finnish_stop.txt
@@ -1,12 +1,12 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/finnish/stop.txt
+ | From https://snowballstem.org/algorithms/finnish/stop.txt
  | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
  |  - Encoding was converted to UTF-8.
  |  - This notice was added.
  |
  | NOTE: To use this file with StopFilterFactory, you must specify format="snowball"
- 
+
 | forms of BE
 
 olla
@@ -48,8 +48,8 @@ me     meidän meidät meitä  meissä  meistä  meihin meillä  meiltä  meille
 te     teidän teidät teitä  teissä  teistä  teihin teillä  teiltä  teille                | you
 he     heidän heidät heitä  heissä  heistä  heihin heillä  heiltä  heille                | they
 
-tämä   tämän         tätä   tässä   tästä   tähän  tallä   tältä   tälle   tänä   täksi  | this
-tuo    tuon          tuotä  tuossa  tuosta  tuohon tuolla  tuolta  tuolle  tuona  tuoksi | that
+tämä   tämän         tätä   tässä   tästä   tähän  tällä   tältä   tälle   tänä   täksi  | this
+tuo    tuon          tuota  tuossa  tuosta  tuohon tuolla  tuolta  tuolle  tuona  tuoksi | that
 se     sen           sitä   siinä   siitä   siihen sillä   siltä   sille   sinä   siksi  | it
 nämä   näiden        näitä  näissä  näistä  näihin näillä  näiltä  näille  näinä  näiksi | these
 nuo    noiden        noita  noissa  noista  noihin noilla  noilta  noille  noina  noiksi | those
@@ -91,7 +91,6 @@ yli     | over, across
 | other
 
 kun    | when
-niin   | so
 nyt    | now
 itse   | self
 
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/french_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/french_stop.txt
index 749abae..8fec2c9 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/french_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/french_stop.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/french/stop.txt
+ | From https://snowballstem.org/algorithms/french/stop.txt
  | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
  |  - Encoding was converted to UTF-8.
  |  - This notice was added.
  |
@@ -169,8 +169,8 @@ eussent
 
                | Later additions (from Jean-Christophe Deschamps)
 ceci           |  this
-cela           |  that
-celà           |  that
+cela           |  that (added 11 Apr 2012. Omission reported by Adrien Grand)
+celà           |  that (incorrect, though common)
 cet            |  this
 cette          |  this
 ici            |  here
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/german_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/german_stop.txt
index 86525e7..804bbbd 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/german_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/german_stop.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/german/stop.txt
+ | From https://snowballstem.org/algorithms/german/stop.txt
  | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
  |  - Encoding was converted to UTF-8.
  |  - This notice was added.
  |
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/hungarian_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/hungarian_stop.txt
index 37526da..3fa279e 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/hungarian_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/hungarian_stop.txt
@@ -1,12 +1,12 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/hungarian/stop.txt
+ | From https://snowballstem.org/algorithms/hungarian/stop.txt
  | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
  |  - Encoding was converted to UTF-8.
  |  - This notice was added.
  |
  | NOTE: To use this file with StopFilterFactory, you must specify format="snowball"
- 
+
 | Hungarian stop word list
 | prepared by Anna Tordai
 
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/indonesian_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/indonesian_stop.txt
new file mode 100644
index 0000000..225560b
--- /dev/null
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/indonesian_stop.txt
@@ -0,0 +1,99 @@
+ | From https://snowballstem.org/algorithms/indonesian/stop.txt
+ | This file is distributed under the BSD License.
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
+ |  - Encoding was converted to UTF-8.
+ |  - This notice was added.
+ |
+ | NOTE: To use this file with StopFilterFactory, you must specify format="snowball"
+yang		| that
+dan		| and
+di		| in
+dari		| from
+ini		| this
+pada kepada	| at, to [person]
+ada adalah	| there is, is
+dengan		| with
+untuk		| for
+dalam		| in the
+oleh		| by
+sebagai		| as
+juga		| also, too
+ke		| to
+atau		| or
+tidak		| not
+itu		| that
+sebuah		| a
+tersebut	| the
+dapat		| can, may
+ia		| he/she, yes
+telah		| already
+satu		| one
+memiliki	| have
+mereka		| they
+bahwa		| that
+lebih		| more, more than
+karena		| because, since
+seorang		| one person, same
+akan		| will, about to
+seperti		| as, like
+secara		| on
+kemudian	| later, then
+beberapa	| some
+banyak		| many
+antara		| between
+setelah		| after
+yaitu		| that is
+hanya		| only
+hingga		| to
+serta		| along with
+sama		| same, and
+dia		| he/she/it (informal)
+tetapi		| but
+namun		| however
+melalui		| through
+bisa		| can
+sehingga	| so
+ketika		| when
+suatu		| a
+sendiri		| own (adverb)
+bagi		| for
+semua		| all
+harus		| must
+setiap		| each, every
+maka		| then
+maupun		| as well
+tanpa		| without
+saja		| only
+jika		| if
+bukan		| not
+belum		| not yet
+sedangkan	| while
+yakni		| i.e.
+meskipun	| although
+hampir		| almost
+kita		| we/us (inclusive)
+demikian	| thereby
+daripada	| from/than/instead of
+apa		| what/which/or/eh
+ialah		| is
+sana		| there
+begitu		| so
+seseorang	| someone
+selain		| besides
+terlalu		| too
+ataupun		| or
+saya		| me/I (formal)
+bila		| if/when
+bagaimana	| how
+tapi		| but
+apabila		| when/if
+kalau		| if
+kami		| we/us (exclusive)
+melainkan	| but (rather)
+boleh		| may,can
+aku		| I/me (informal)
+anda		| you (formal)
+kamu		| you (informal)
+beliau		| he/she/it (formal)
+kalian		| you (plural)
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/italian_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/italian_stop.txt
index 1219cc7..c74160e 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/italian_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/italian_stop.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/italian/stop.txt
+ | From https://snowballstem.org/algorithms/italian/stop.txt
  | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
  |  - Encoding was converted to UTF-8.
  |  - This notice was added.
  |
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/norwegian_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/norwegian_stop.txt
index a7a2c28..f427609 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/norwegian_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/norwegian_stop.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/norwegian/stop.txt
+ | From https://snowballstem.org/algorithms/norwegian/stop.txt
  | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
  |  - Encoding was converted to UTF-8.
  |  - This notice was added.
  |
@@ -25,7 +25,7 @@ et             | a/an
 den            | it/this/that
 til            | to
 er             | is/am/are
-som            | who/that
+som            | who/which/that
 på             | on
 de             | they / you(formal)
 med            | with
@@ -84,7 +84,6 @@ noen           | some
 noe            | some
 ville          | would
 dere           | you
-som            | who/which/that
 deres          | their/theirs
 kun            | only/just
 ja             | yes
@@ -129,7 +128,6 @@ mange          | many
 også           | also
 slik           | just
 vært           | been
-være           | to be
 båe            | both *
 begge          | both
 siden          | since
@@ -155,7 +153,6 @@ hennar         | her/hers
 hennes         | hers
 hoss           | how *
 hossen         | how *
-ikkje          | not *
 ingi           | noone *
 inkje          | noone *
 korleis        | how *
@@ -177,7 +174,6 @@ noka           | some (fem.) *
 nokor          | some *
 noko           | some *
 nokre          | some *
-si             | his/hers *
 sia            | since *
 sidan          | since *
 so             | so *
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/portuguese_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/portuguese_stop.txt
index acfeb01..d03d7f2 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/portuguese_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/portuguese_stop.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/portuguese/stop.txt
+ | From https://snowballstem.org/algorithms/portuguese/stop.txt
  | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
  |  - Encoding was converted to UTF-8.
  |  - This notice was added.
  |
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/russian_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/russian_stop.txt
index 5527140..65512d4 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/russian_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/russian_stop.txt
@@ -1,12 +1,13 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/russian/stop.txt
+ | From https://snowballstem.org/algorithms/russian/stop.txt
  | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
  |  - Encoding was converted to UTF-8.
  |  - This notice was added.
  |
  | NOTE: To use this file with StopFilterFactory, you must specify format="snowball"
 
+
  | a russian stop word list. comments begin with vertical bar. each stop
  | word is at the start of a line.
 
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/spanish_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/spanish_stop.txt
index 487d78c..48bd65e 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/spanish_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/spanish_stop.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/spanish/stop.txt
+ | From https://snowballstem.org/algorithms/spanish/stop.txt
  | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
  |  - Encoding was converted to UTF-8.
  |  - This notice was added.
  |
diff --git a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/swedish_stop.txt b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/swedish_stop.txt
index 096f87f..a5f056b 100644
--- a/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/swedish_stop.txt
+++ b/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/swedish_stop.txt
@@ -1,7 +1,7 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/swedish/stop.txt
+ | From https://snowballstem.org/algorithms/swedish/stop.txt
  | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
+ | See https://snowballstem.org/license.html
+ | Also see https://opensource.org/licenses/bsd-license.html
  |  - Encoding was converted to UTF-8.
  |  - This notice was added.
  |