You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by igiguere <ig...@opentext.com> on 2013/05/22 22:14:14 UTC

Re: Russian stopwords

I'm encountering the same issue, but, my Russian stopwords.txt IS encoded in
UTF-8.

I verified the encoding using EmEditor (I've used it for years, and I use it
for the existing English, French, Spanish, Portuguese and German Solr
configurations, without issues).
Just to make extra sure, I downloaded Edit Plus, as mentioned in this
thread, and verified the encoding again: UTF-8

I realize this will pass for a stupid question, but... Could there be any
issue other than encoding ?

Thanks;



--
View this message in context: http://lucene.472066.n3.nabble.com/Russian-stopwords-tp491490p4065440.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Russian stopwords

Posted by igiguere <ig...@opentext.com>.
A colleague stumbled upon this :

http://stackoverflow.com/questions/361975/setting-the-default-java-character-encoding

The second answer, environment variable JAVA_TOOL_OPTIONS did the job.

JAVA_TOOL_OPTIONS : -Dfile.encoding=UTF8

Happy stop-wording !




--
View this message in context: http://lucene.472066.n3.nabble.com/Russian-stopwords-tp491490p4065976.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Russian stopwords

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Sounds like maybe UTF-specific issue when you are _reading it in_. See
if you can change the default locale before starting Java Process (I
think it is an environmental variable) and check if that makes an
impact.

If you have a very easy test-case, I would be happy to check it on Mac
and Windows. I know Russian (and UTF-8 issues).

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Fri, May 24, 2013 at 9:01 AM, igiguere <ig...@opentext.com> wrote:
> Just so everyone knows :
>
> It turns out my stopwords.txt was OK after all.  It functions correctly on a
> Linux (ubuntu), and, strangely, on a colleague's Windows 7.  My computer is
> also Windows 7.  The only difference between the 2 Windows is the language
> of the interface (French for mine, English for my colleague).
>
> Strange... Very very strange.  I hope someone from Microsoft reads this
> someday.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Russian-stopwords-tp491490p4065910.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Russian stopwords

Posted by igiguere <ig...@opentext.com>.
Just so everyone knows :

It turns out my stopwords.txt was OK after all.  It functions correctly on a
Linux (ubuntu), and, strangely, on a colleague's Windows 7.  My computer is
also Windows 7.  The only difference between the 2 Windows is the language
of the interface (French for mine, English for my colleague).

Strange... Very very strange.  I hope someone from Microsoft reads this
someday.



--
View this message in context: http://lucene.472066.n3.nabble.com/Russian-stopwords-tp491490p4065910.html
Sent from the Solr - User mailing list archive at Nabble.com.