You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Lesieur (JIRA)" <ji...@apache.org> on 2013/08/15 03:18:48 UTC

[jira] [Created] (SOLR-5157) Broken French stop words example

David Lesieur created SOLR-5157:
-----------------------------------

             Summary: Broken French stop words example
                 Key: SOLR-5157
                 URL: https://issues.apache.org/jira/browse/SOLR-5157
             Project: Solr
          Issue Type: Bug
          Components: documentation
    Affects Versions: 4.4, 3.6.2
            Reporter: David Lesieur
            Priority: Trivial


The French stop words example file that's distributed with Solr (in example/solr/collection1/conf/lang/stopwords_fr.txt) appears to be broken. Most lines include a comment that starts with a '|' character. Apparently these are not interpreted as comments by Solr. Here's a patch that could fix this.

{noformat}
--- /tmp/solr-4.4.0/example/solr/collection1/conf/lang/stopwords_fr.txt	2013-07-10 11:11:37.000000000 -0400
+++ stopwords_fr.txt	2013-08-14 20:33:36.168914026 -0400
@@ -1,82 +1,78 @@
- | From svn.tartarus.org/snowball/trunk/website/algorithms/french/stop.txt
- | This file is distributed under the BSD License.
- | See http://snowball.tartarus.org/license.php
- | Also see http://www.opensource.org/licenses/bsd-license.html
- |  - Encoding was converted to UTF-8.
- |  - This notice was added.
+# From svn.tartarus.org/snowball/trunk/website/algorithms/french/stop.txt
+# This file is distributed under the BSD License.
+# See http://snowball.tartarus.org/license.php
+# Also see http://www.opensource.org/licenses/bsd-license.html
+#   - Encoding was converted to UTF-8.
+#   - This notice was added.
+
+au
+aux
+avec
+ce
+ces
+dans
+de
+des
+du
+elle
+en
+et
+eux
+il
+je
+la
+le
+leur
+lui
+ma
+mais
+me
+même
+mes
+moi
+mon
+ne
+nos
+notre
+nous
+on
+ou
+par
+pas
+pour
+qu
+que
+qui
+sa
+se
+ses
+son
+sur
+ta
+te
+tes
+toi
+ton
+tu
+un
+une
+vos
+votre
+vous
+
+# Single letter forms
+c
+d
+j
+l
+à
+m
+n
+s
+t
+y
 
- | A French stop word list. Comments begin with vertical bar. Each stop
- | word is at the start of a line.
-
-au             |  a + le
-aux            |  a + les
-avec           |  with
-ce             |  this
-ces            |  these
-dans           |  with
-de             |  of
-des            |  de + les
-du             |  de + le
-elle           |  she
-en             |  `of them' etc
-et             |  and
-eux            |  them
-il             |  he
-je             |  I
-la             |  the
-le             |  the
-leur           |  their
-lui            |  him
-ma             |  my (fem)
-mais           |  but
-me             |  me
-même           |  same; as in moi-même (myself) etc
-mes            |  me (pl)
-moi            |  me
-mon            |  my (masc)
-ne             |  not
-nos            |  our (pl)
-notre          |  our
-nous           |  we
-on             |  one
-ou             |  where
-par            |  by
-pas            |  not
-pour           |  for
-qu             |  que before vowel
-que            |  that
-qui            |  who
-sa             |  his, her (fem)
-se             |  oneself
-ses            |  his (pl)
-son            |  his, her (masc)
-sur            |  on
-ta             |  thy (fem)
-te             |  thee
-tes            |  thy (pl)
-toi            |  thee
-ton            |  thy (masc)
-tu             |  thou
-un             |  a
-une            |  a
-vos            |  your (pl)
-votre          |  your
-vous           |  you
-
-               |  single letter forms
-
-c              |  c'
-d              |  d'
-j              |  j'
-l              |  l'
-à              |  to, at
-m              |  m'
-n              |  n'
-s              |  s'
-t              |  t'
-y              |  there
-
-               | forms of être (not including the infinitive):
+# Forms of être (not including the infinitive):
 été
 étée
 étées
@@ -121,7 +117,7 @@
 fussiez
 fussent
 
-               | forms of avoir (not including the infinitive):
+# Forms of avoir (not including the infinitive):
 ayant
 eu
 eue
@@ -165,20 +161,18 @@
 eussiez
 eussent
 
-               | Later additions (from Jean-Christophe Deschamps)
-ceci           |  this
-cela           |  that
-celà           |  that
-cet            |  this
-cette          |  this
-ici            |  here
-ils            |  they
-les            |  the (pl)
-leurs          |  their (pl)
-quel           |  which
-quels          |  which
-quelle         |  which
-quelles        |  which
-sans           |  without
-soi            |  oneself
-
+# Later additions (from Jean-Christophe Deschamps)
+ceci
+celà
+cet
+cette
+ici
+ils
+les
+leurs
+quel
+quels
+quelle
+quelles
+sans
+soi
{noformat}

I'm not very familiar with this issue tracker. Please forgive me if I'm misusing it in any way...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org