You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Lesieur (JIRA)" <ji...@apache.org> on 2013/08/15 03:42:47 UTC

[jira] [Closed] (SOLR-5157) Broken French stop words example

     [ https://issues.apache.org/jira/browse/SOLR-5157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Lesieur closed SOLR-5157.
-------------------------------

    Reproduced In: 3.6.2
    
> Broken French stop words example
> --------------------------------
>
>                 Key: SOLR-5157
>                 URL: https://issues.apache.org/jira/browse/SOLR-5157
>             Project: Solr
>          Issue Type: Bug
>          Components: documentation
>    Affects Versions: 3.6.2, 4.4
>            Reporter: David Lesieur
>            Priority: Trivial
>
> The French stop words example file that's distributed with Solr (in example/solr/collection1/conf/lang/stopwords_fr.txt) appears to be broken. Most lines include a comment that starts with a '|' character. Apparently these are not interpreted as comments by Solr. Here's a patch that could fix this.
> {noformat}
> --- /tmp/solr-4.4.0/example/solr/collection1/conf/lang/stopwords_fr.txt	2013-07-10 11:11:37.000000000 -0400
> +++ stopwords_fr.txt	2013-08-14 20:33:36.168914026 -0400
> @@ -1,82 +1,78 @@
> - | From svn.tartarus.org/snowball/trunk/website/algorithms/french/stop.txt
> - | This file is distributed under the BSD License.
> - | See http://snowball.tartarus.org/license.php
> - | Also see http://www.opensource.org/licenses/bsd-license.html
> - |  - Encoding was converted to UTF-8.
> - |  - This notice was added.
> +# From svn.tartarus.org/snowball/trunk/website/algorithms/french/stop.txt
> +# This file is distributed under the BSD License.
> +# See http://snowball.tartarus.org/license.php
> +# Also see http://www.opensource.org/licenses/bsd-license.html
> +#   - Encoding was converted to UTF-8.
> +#   - This notice was added.
> +
> +au
> +aux
> +avec
> +ce
> +ces
> +dans
> +de
> +des
> +du
> +elle
> +en
> +et
> +eux
> +il
> +je
> +la
> +le
> +leur
> +lui
> +ma
> +mais
> +me
> +même
> +mes
> +moi
> +mon
> +ne
> +nos
> +notre
> +nous
> +on
> +ou
> +par
> +pas
> +pour
> +qu
> +que
> +qui
> +sa
> +se
> +ses
> +son
> +sur
> +ta
> +te
> +tes
> +toi
> +ton
> +tu
> +un
> +une
> +vos
> +votre
> +vous
> +
> +# Single letter forms
> +c
> +d
> +j
> +l
> +à
> +m
> +n
> +s
> +t
> +y
>  
> - | A French stop word list. Comments begin with vertical bar. Each stop
> - | word is at the start of a line.
> -
> -au             |  a + le
> -aux            |  a + les
> -avec           |  with
> -ce             |  this
> -ces            |  these
> -dans           |  with
> -de             |  of
> -des            |  de + les
> -du             |  de + le
> -elle           |  she
> -en             |  `of them' etc
> -et             |  and
> -eux            |  them
> -il             |  he
> -je             |  I
> -la             |  the
> -le             |  the
> -leur           |  their
> -lui            |  him
> -ma             |  my (fem)
> -mais           |  but
> -me             |  me
> -même           |  same; as in moi-même (myself) etc
> -mes            |  me (pl)
> -moi            |  me
> -mon            |  my (masc)
> -ne             |  not
> -nos            |  our (pl)
> -notre          |  our
> -nous           |  we
> -on             |  one
> -ou             |  where
> -par            |  by
> -pas            |  not
> -pour           |  for
> -qu             |  que before vowel
> -que            |  that
> -qui            |  who
> -sa             |  his, her (fem)
> -se             |  oneself
> -ses            |  his (pl)
> -son            |  his, her (masc)
> -sur            |  on
> -ta             |  thy (fem)
> -te             |  thee
> -tes            |  thy (pl)
> -toi            |  thee
> -ton            |  thy (masc)
> -tu             |  thou
> -un             |  a
> -une            |  a
> -vos            |  your (pl)
> -votre          |  your
> -vous           |  you
> -
> -               |  single letter forms
> -
> -c              |  c'
> -d              |  d'
> -j              |  j'
> -l              |  l'
> -à              |  to, at
> -m              |  m'
> -n              |  n'
> -s              |  s'
> -t              |  t'
> -y              |  there
> -
> -               | forms of être (not including the infinitive):
> +# Forms of être (not including the infinitive):
>  été
>  étée
>  étées
> @@ -121,7 +117,7 @@
>  fussiez
>  fussent
>  
> -               | forms of avoir (not including the infinitive):
> +# Forms of avoir (not including the infinitive):
>  ayant
>  eu
>  eue
> @@ -165,20 +161,18 @@
>  eussiez
>  eussent
>  
> -               | Later additions (from Jean-Christophe Deschamps)
> -ceci           |  this
> -cela           |  that
> -celà           |  that
> -cet            |  this
> -cette          |  this
> -ici            |  here
> -ils            |  they
> -les            |  the (pl)
> -leurs          |  their (pl)
> -quel           |  which
> -quels          |  which
> -quelle         |  which
> -quelles        |  which
> -sans           |  without
> -soi            |  oneself
> -
> +# Later additions (from Jean-Christophe Deschamps)
> +ceci
> +celà
> +cet
> +cette
> +ici
> +ils
> +les
> +leurs
> +quel
> +quels
> +quelle
> +quelles
> +sans
> +soi
> {noformat}
> I'm not very familiar with this issue tracker. Please forgive me if I'm misusing it in any way...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org