You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by dw...@apache.org on 2021/10/19 07:45:56 UTC

[lucene] branch main updated: LUCENE-10166: removed module-level README.txt and modified a few links, removed a few obsolete instructions from 20 years ago. (#379)

This is an automated email from the ASF dual-hosted git repository.

dweiss pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/lucene.git


The following commit(s) were added to refs/heads/main by this push:
     new e290f91  LUCENE-10166: removed module-level README.txt and modified a few links, removed a few obsolete instructions from 20 years ago. (#379)
e290f91 is described below

commit e290f91bb233f33cde4b2249d676298d5740e8b1
Author: Dawid Weiss <da...@carrotsearch.com>
AuthorDate: Tue Oct 19 09:45:49 2021 +0200

    LUCENE-10166: removed module-level README.txt and modified a few links, removed a few obsolete instructions from 20 years ago. (#379)
---
 NOTICE.txt                                         |  2 +-
 lucene/analysis/README.txt                         | 70 ----------------------
 lucene/analysis/common/README.txt                  | 22 -------
 lucene/analysis/common/checksums.properties        |  2 -
 .../analysis/de/GermanNormalizationFilter.java     |  4 +-
 .../apache/lucene/analysis/en/PorterStemmer.java   |  2 +-
 .../lucene/analysis/snowball/package-info.java     | 31 ++++------
 lucene/analysis/stempel/src/java/overview.html     |  2 +-
 lucene/misc/README.txt                             |  3 -
 .../apache/lucene/misc/SweetSpotSimilarity.java    |  2 +-
 .../java/org/apache/lucene/misc/package-info.java  |  2 +-
 11 files changed, 18 insertions(+), 124 deletions(-)

diff --git a/NOTICE.txt b/NOTICE.txt
index 54d00bc..b9bfbc9 100644
--- a/NOTICE.txt
+++ b/NOTICE.txt
@@ -52,7 +52,7 @@ The snowball stopword lists in
   analysis/common/src/resources/org/apache/lucene/analysis/snowball
 were developed by Martin Porter and Richard Boulton.
 The full snowball package is available from
-  http://snowball.tartarus.org/
+  https://snowballstem.org/
 
 The KStem stemmer in
   analysis/common/src/org/apache/lucene/analysis/en
diff --git a/lucene/analysis/README.txt b/lucene/analysis/README.txt
deleted file mode 100644
index 09857df..0000000
--- a/lucene/analysis/README.txt
+++ /dev/null
@@ -1,70 +0,0 @@
-Analysis README file
-
-INTRODUCTION
-
-The Analysis Module provides analysis capabilities to Lucene and Solr
-applications.
-
-The Lucene web site is at:
-  http://lucene.apache.org/
-
-Please join the Lucene-User mailing list by sending a message to:
-  java-user-subscribe@lucene.apache.org
-
-FILES
-
-lucene-analysis-common-XX.jar
-  The primary analysis module library, containing general-purpose analysis
-  components and support for various languages.
-
-lucene-analysis-icu-XX.jar
-  An add-on analysis library that provides improved Unicode support via
-  International Components for Unicode (ICU). Note: this module depends on
-  the ICU4j jar file (version >= 4.6.0)
-
-lucene-analysis-kuromoji-XX.jar
-  An analyzer with morphological analysis for Japanese.
-
-lucene-analysis-morfologik-XX.jar
-  An analyzer using the Morfologik stemming library.
-
-lucene-analysis-nori-XX.jar
-  An analyzer with morphological analysis for Korean.
-
-lucene-analysis-opennlp-XX.jar
-  An analyzer using the OpenNLP natural-language processing library.
-
-lucene-analysis-phonetic-XX.jar
-  An add-on analysis library that provides phonetic encoders via Apache
-  Commons-Codec. Note: this module depends on the commons-codec jar 
-  file
-  
-lucene-analysis-smartcn-XX.jar
-  An add-on analysis library that provides word segmentation for Simplified
-  Chinese.
-
-lucene-analysis-stempel-XX.jar
-  An add-on analysis library that contains a universal algorithmic stemmer,
-  including tables for the Polish language.
-
-common/src/java
-icu/src/java
-kuromoji/src/java
-morfologik/src/java
-nori/src/java
-opennlp/src/java
-phonetic/src/java
-smartcn/src/java
-stempel/src/java
-  The source code for the libraries.
-
-common/src/test
-icu/src/test
-kuromoji/src/test
-morfologik/src/test
-nori/src/test
-opennlp/src/test
-phonetic/src/test
-smartcn/src/test
-stempel/src/test
-  Unit tests for the libraries.
diff --git a/lucene/analysis/common/README.txt b/lucene/analysis/common/README.txt
deleted file mode 100644
index 2d6cac9..0000000
--- a/lucene/analysis/common/README.txt
+++ /dev/null
@@ -1,22 +0,0 @@
-Lucene Analyzers README file
-
-This project provides pre-compiled version of the Snowball stemmers,
-now located at https://github.com/snowballstem/snowball/tree/53739a805cfa6c77ff8496dc711dc1c106d987c1 (GitHub),
-together with classes integrating them with the Lucene search engine.
-
-The snowball tree needs patches applied to properly generate efficient code for lucene.
-You can regenerate everything with 'gradlew snowball'
-Refer to gradle/generation/snowball* files in the build for upgrading snowball.
-
-IMPORTANT NOTICE ON BACKWARDS COMPATIBILITY!
-
-An index created using the Snowball module in Lucene 2.3.2 and below
-might not be compatible with the Snowball module in Lucene 2.4 or greater.
-
-For more information about this issue see:
-https://issues.apache.org/jira/browse/LUCENE-1142
-
-
-For more information on Snowball, see:
-  http://snowball.tartarus.org/
-
diff --git a/lucene/analysis/common/checksums.properties b/lucene/analysis/common/checksums.properties
deleted file mode 100644
index ce06580..0000000
--- a/lucene/analysis/common/checksums.properties
+++ /dev/null
@@ -1,2 +0,0 @@
-
-checksum.jflexClassicTokenizerImpl=8c4eac5fd02be551e666783df5531afda23cbc96
\ No newline at end of file
diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/de/GermanNormalizationFilter.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/de/GermanNormalizationFilter.java
index ed5cff3..dd02480 100644
--- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/de/GermanNormalizationFilter.java
+++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/de/GermanNormalizationFilter.java
@@ -24,8 +24,8 @@ import org.apache.lucene.analysis.util.StemmerUtil;
 
 /**
  * Normalizes German characters according to the heuristics of the <a
- * href="http://snowball.tartarus.org/algorithms/german2/stemmer.html">German2 snowball
- * algorithm</a>. It allows for the fact that ä, ö and ü are sometimes written as ae, oe and ue.
+ * href="https://snowballstem.org/algorithms/german2/stemmer.html">German2 snowball algorithm</a>.
+ * It allows for the fact that ä, ö and ü are sometimes written as ae, oe and ue.
  *
  * <ul>
  *   <li>'ß' is replaced by 'ss'
diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/PorterStemmer.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/PorterStemmer.java
index d424f3f..3804c8a 100644
--- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/PorterStemmer.java
+++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/PorterStemmer.java
@@ -23,7 +23,7 @@ package org.apache.lucene.analysis.en;
        Porter, 1980, An algorithm for suffix stripping, Program, Vol. 14,
        no. 3, pp 130-137,
 
-   See also http://www.tartarus.org/~martin/PorterStemmer/index.html
+   See also https://snowballstem.org/algorithms/porter/stemmer.html
 
    Bug 1 (reported by Gonzalo Parra 16/10/99) fixed as marked below.
    Tthe words 'aed', 'eed', 'oed' leave k at 'a' for step 3, and b[k-1]
diff --git a/lucene/analysis/common/src/java/org/apache/lucene/analysis/snowball/package-info.java b/lucene/analysis/common/src/java/org/apache/lucene/analysis/snowball/package-info.java
index d23432d..d0f2f70 100644
--- a/lucene/analysis/common/src/java/org/apache/lucene/analysis/snowball/package-info.java
+++ b/lucene/analysis/common/src/java/org/apache/lucene/analysis/snowball/package-info.java
@@ -17,30 +17,21 @@
 
 /**
  * {@link org.apache.lucene.analysis.TokenFilter} and {@link org.apache.lucene.analysis.Analyzer}
- * implementations that use Snowball stemmers.
+ * implementations that use a modified version of <a href="https://snowballstem.org/">Snowball
+ * stemmers</a>. See <a href="https://snowballstem.org/">Snowball project page</a> for more
+ * information about the original algorithms used.
  *
- * <p>This project provides pre-compiled version of the Snowball stemmers based on revision 500 of
- * the Tartarus Snowball repository, together with classes integrating them with the Lucene search
- * engine.
+ * <p>Lucene snowball classes require a few patches to the original Snowball source tree to generate
+ * more efficient code.
  *
- * <p>A few changes has been made to the static Snowball code and compiled stemmers:
- *
- * <ul>
- *   <li>Class SnowballProgram is made abstract and contains new abstract method stem() to avoid
- *       reflection in Lucene filter class SnowballFilter.
- *   <li>All use of StringBuffers has been refactored to StringBuilder for speed.
- *   <li>Snowball BSD license header has been added to the Java classes to avoid having RAT adding
- *       ASL headers.
- * </ul>
- *
- * <p>See the Snowball <a href ="http://snowball.tartarus.org/">home page</a> for more information
- * about the algorithms.
+ * <p>Refer to {@code gradle/generation/snowball*} and {@code help/regeneration.txt} files in Lucene
+ * source code for instructions on how code regeneration from Snowball sources works, what
+ * modifications are applied and what is required to regenerate snowball analyzers from scratch.
  *
  * <p><b>IMPORTANT NOTICE ON BACKWARDS COMPATIBILITY!</b>
  *
- * <p>An index created using the Snowball module in Lucene 2.3.2 and below might not be compatible
- * with the Snowball module in Lucene 2.4 or greater.
- *
- * <p>For more information about this issue see: https://issues.apache.org/jira/browse/LUCENE-1142
+ * <p>An index created using the Snowball module in one Lucene version may not be compatible with an
+ * index created with another Lucene version. The token stream will vary depending on the changes in
+ * snowball stemmer definitions.
  */
 package org.apache.lucene.analysis.snowball;
diff --git a/lucene/analysis/stempel/src/java/overview.html b/lucene/analysis/stempel/src/java/overview.html
index dfb3b90..6f3def6 100644
--- a/lucene/analysis/stempel/src/java/overview.html
+++ b/lucene/analysis/stempel/src/java/overview.html
@@ -98,7 +98,7 @@ heuristic rules<br>
 </ul>
 There are many existing and well-known implementations of stemmers for
 English (Porter, Lovins, Krovetz) and other European languages
-(<a href="http://snowball.tartarus.org">Snowball</a>). There are also
+(<a href="https://snowballstem.org/">Snowball</a>). There are also
 good quality commercial lemmatizers for Polish. However, there is only
 one
 freely available Polish stemmer, implemented by
diff --git a/lucene/misc/README.txt b/lucene/misc/README.txt
deleted file mode 100644
index 44560ee..0000000
--- a/lucene/misc/README.txt
+++ /dev/null
@@ -1,3 +0,0 @@
-miscellaneous is a home of different Lucene-related classes
-that all belong to org.apache.lucene.misc package, as they are not
-substantial enough to warrant their own package.
diff --git a/lucene/misc/src/java/org/apache/lucene/misc/SweetSpotSimilarity.java b/lucene/misc/src/java/org/apache/lucene/misc/SweetSpotSimilarity.java
index b5097fe..3b667f3 100644
--- a/lucene/misc/src/java/org/apache/lucene/misc/SweetSpotSimilarity.java
+++ b/lucene/misc/src/java/org/apache/lucene/misc/SweetSpotSimilarity.java
@@ -129,7 +129,7 @@ public class SweetSpotSimilarity extends ClassicSimilarity {
    *  (x &lt;= min) &#63; base : sqrt(x+(base**2)-min)
    * </code> ...but with a special case check for 0.
    *
-   * <p>This degrates to <code>sqrt(x)</code> when min and base are both 0
+   * <p>This degrades to <code>sqrt(x)</code> when min and base are both 0
    *
    * @see #setBaselineTfFactors
    * @see <a href="doc-files/ss.baselineTf.svg">An SVG visualization of this function</a>
diff --git a/lucene/misc/src/java/org/apache/lucene/misc/package-info.java b/lucene/misc/src/java/org/apache/lucene/misc/package-info.java
index 3fd5f32..f4b3242 100644
--- a/lucene/misc/src/java/org/apache/lucene/misc/package-info.java
+++ b/lucene/misc/src/java/org/apache/lucene/misc/package-info.java
@@ -15,5 +15,5 @@
  * limitations under the License.
  */
 
-/** Miscellaneous index tools. */
+/** Miscellaneous Lucene utilities that don't really fit anywhere else. */
 package org.apache.lucene.misc;