You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Lesieur (JIRA)" <ji...@apache.org> on 2013/08/15 03:40:47 UTC
[jira] [Commented] (SOLR-5157) Broken French stop words example
[ https://issues.apache.org/jira/browse/SOLR-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740553#comment-13740553 ]
David Lesieur commented on SOLR-5157:
-------------------------------------
Oops, you're right, I had missed that part. Thanks for the help!
> Broken French stop words example
> --------------------------------
>
> Key: SOLR-5157
> URL: https://issues.apache.org/jira/browse/SOLR-5157
> Project: Solr
> Issue Type: Bug
> Components: documentation
> Affects Versions: 3.6.2, 4.4
> Reporter: David Lesieur
> Priority: Trivial
>
> The French stop words example file that's distributed with Solr (in example/solr/collection1/conf/lang/stopwords_fr.txt) appears to be broken. Most lines include a comment that starts with a '|' character. Apparently these are not interpreted as comments by Solr. Here's a patch that could fix this.
> {noformat}
> --- /tmp/solr-4.4.0/example/solr/collection1/conf/lang/stopwords_fr.txt 2013-07-10 11:11:37.000000000 -0400
> +++ stopwords_fr.txt 2013-08-14 20:33:36.168914026 -0400
> @@ -1,82 +1,78 @@
> - | From svn.tartarus.org/snowball/trunk/website/algorithms/french/stop.txt
> - | This file is distributed under the BSD License.
> - | See http://snowball.tartarus.org/license.php
> - | Also see http://www.opensource.org/licenses/bsd-license.html
> - | - Encoding was converted to UTF-8.
> - | - This notice was added.
> +# From svn.tartarus.org/snowball/trunk/website/algorithms/french/stop.txt
> +# This file is distributed under the BSD License.
> +# See http://snowball.tartarus.org/license.php
> +# Also see http://www.opensource.org/licenses/bsd-license.html
> +# - Encoding was converted to UTF-8.
> +# - This notice was added.
> +
> +au
> +aux
> +avec
> +ce
> +ces
> +dans
> +de
> +des
> +du
> +elle
> +en
> +et
> +eux
> +il
> +je
> +la
> +le
> +leur
> +lui
> +ma
> +mais
> +me
> +même
> +mes
> +moi
> +mon
> +ne
> +nos
> +notre
> +nous
> +on
> +ou
> +par
> +pas
> +pour
> +qu
> +que
> +qui
> +sa
> +se
> +ses
> +son
> +sur
> +ta
> +te
> +tes
> +toi
> +ton
> +tu
> +un
> +une
> +vos
> +votre
> +vous
> +
> +# Single letter forms
> +c
> +d
> +j
> +l
> +à
> +m
> +n
> +s
> +t
> +y
>
> - | A French stop word list. Comments begin with vertical bar. Each stop
> - | word is at the start of a line.
> -
> -au | a + le
> -aux | a + les
> -avec | with
> -ce | this
> -ces | these
> -dans | with
> -de | of
> -des | de + les
> -du | de + le
> -elle | she
> -en | `of them' etc
> -et | and
> -eux | them
> -il | he
> -je | I
> -la | the
> -le | the
> -leur | their
> -lui | him
> -ma | my (fem)
> -mais | but
> -me | me
> -même | same; as in moi-même (myself) etc
> -mes | me (pl)
> -moi | me
> -mon | my (masc)
> -ne | not
> -nos | our (pl)
> -notre | our
> -nous | we
> -on | one
> -ou | where
> -par | by
> -pas | not
> -pour | for
> -qu | que before vowel
> -que | that
> -qui | who
> -sa | his, her (fem)
> -se | oneself
> -ses | his (pl)
> -son | his, her (masc)
> -sur | on
> -ta | thy (fem)
> -te | thee
> -tes | thy (pl)
> -toi | thee
> -ton | thy (masc)
> -tu | thou
> -un | a
> -une | a
> -vos | your (pl)
> -votre | your
> -vous | you
> -
> - | single letter forms
> -
> -c | c'
> -d | d'
> -j | j'
> -l | l'
> -à | to, at
> -m | m'
> -n | n'
> -s | s'
> -t | t'
> -y | there
> -
> - | forms of être (not including the infinitive):
> +# Forms of être (not including the infinitive):
> été
> étée
> étées
> @@ -121,7 +117,7 @@
> fussiez
> fussent
>
> - | forms of avoir (not including the infinitive):
> +# Forms of avoir (not including the infinitive):
> ayant
> eu
> eue
> @@ -165,20 +161,18 @@
> eussiez
> eussent
>
> - | Later additions (from Jean-Christophe Deschamps)
> -ceci | this
> -cela | that
> -celà | that
> -cet | this
> -cette | this
> -ici | here
> -ils | they
> -les | the (pl)
> -leurs | their (pl)
> -quel | which
> -quels | which
> -quelle | which
> -quelles | which
> -sans | without
> -soi | oneself
> -
> +# Later additions (from Jean-Christophe Deschamps)
> +ceci
> +celà
> +cet
> +cette
> +ici
> +ils
> +les
> +leurs
> +quel
> +quels
> +quelle
> +quelles
> +sans
> +soi
> {noformat}
> I'm not very familiar with this issue tracker. Please forgive me if I'm misusing it in any way...
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org