You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Lesieur (JIRA)" <ji...@apache.org> on 2013/08/15 03:18:48 UTC
[jira] [Created] (SOLR-5157) Broken French stop words example
David Lesieur created SOLR-5157:
-----------------------------------
Summary: Broken French stop words example
Key: SOLR-5157
URL: https://issues.apache.org/jira/browse/SOLR-5157
Project: Solr
Issue Type: Bug
Components: documentation
Affects Versions: 4.4, 3.6.2
Reporter: David Lesieur
Priority: Trivial
The French stop words example file that's distributed with Solr (in example/solr/collection1/conf/lang/stopwords_fr.txt) appears to be broken. Most lines include a comment that starts with a '|' character. Apparently these are not interpreted as comments by Solr. Here's a patch that could fix this.
{noformat}
--- /tmp/solr-4.4.0/example/solr/collection1/conf/lang/stopwords_fr.txt 2013-07-10 11:11:37.000000000 -0400
+++ stopwords_fr.txt 2013-08-14 20:33:36.168914026 -0400
@@ -1,82 +1,78 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/french/stop.txt
- | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
- | - Encoding was converted to UTF-8.
- | - This notice was added.
+# From svn.tartarus.org/snowball/trunk/website/algorithms/french/stop.txt
+# This file is distributed under the BSD License.
+# See http://snowball.tartarus.org/license.php
+# Also see http://www.opensource.org/licenses/bsd-license.html
+# - Encoding was converted to UTF-8.
+# - This notice was added.
+
+au
+aux
+avec
+ce
+ces
+dans
+de
+des
+du
+elle
+en
+et
+eux
+il
+je
+la
+le
+leur
+lui
+ma
+mais
+me
+même
+mes
+moi
+mon
+ne
+nos
+notre
+nous
+on
+ou
+par
+pas
+pour
+qu
+que
+qui
+sa
+se
+ses
+son
+sur
+ta
+te
+tes
+toi
+ton
+tu
+un
+une
+vos
+votre
+vous
+
+# Single letter forms
+c
+d
+j
+l
+à
+m
+n
+s
+t
+y
- | A French stop word list. Comments begin with vertical bar. Each stop
- | word is at the start of a line.
-
-au | a + le
-aux | a + les
-avec | with
-ce | this
-ces | these
-dans | with
-de | of
-des | de + les
-du | de + le
-elle | she
-en | `of them' etc
-et | and
-eux | them
-il | he
-je | I
-la | the
-le | the
-leur | their
-lui | him
-ma | my (fem)
-mais | but
-me | me
-même | same; as in moi-même (myself) etc
-mes | me (pl)
-moi | me
-mon | my (masc)
-ne | not
-nos | our (pl)
-notre | our
-nous | we
-on | one
-ou | where
-par | by
-pas | not
-pour | for
-qu | que before vowel
-que | that
-qui | who
-sa | his, her (fem)
-se | oneself
-ses | his (pl)
-son | his, her (masc)
-sur | on
-ta | thy (fem)
-te | thee
-tes | thy (pl)
-toi | thee
-ton | thy (masc)
-tu | thou
-un | a
-une | a
-vos | your (pl)
-votre | your
-vous | you
-
- | single letter forms
-
-c | c'
-d | d'
-j | j'
-l | l'
-à | to, at
-m | m'
-n | n'
-s | s'
-t | t'
-y | there
-
- | forms of être (not including the infinitive):
+# Forms of être (not including the infinitive):
été
étée
étées
@@ -121,7 +117,7 @@
fussiez
fussent
- | forms of avoir (not including the infinitive):
+# Forms of avoir (not including the infinitive):
ayant
eu
eue
@@ -165,20 +161,18 @@
eussiez
eussent
- | Later additions (from Jean-Christophe Deschamps)
-ceci | this
-cela | that
-celà | that
-cet | this
-cette | this
-ici | here
-ils | they
-les | the (pl)
-leurs | their (pl)
-quel | which
-quels | which
-quelle | which
-quelles | which
-sans | without
-soi | oneself
-
+# Later additions (from Jean-Christophe Deschamps)
+ceci
+celà
+cet
+cette
+ici
+ils
+les
+leurs
+quel
+quels
+quelle
+quelles
+sans
+soi
{noformat}
I'm not very familiar with this issue tracker. Please forgive me if I'm misusing it in any way...
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org