You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Blargy <zm...@hotmail.com> on 2010/07/13 18:55:03 UTC

Foreign characters question

I am trying to add the following synonym while indexing/searching

swimsuit, bañadores, bañador

I testing searching for "bañadores" however it didn't return any results.
After further inspection I noticed in the field analysis admin that swimsuit
gets expanded to ba�adores. Not sure if it will show up but the "n" is a
black diamond with a white question mark in it. 

So basically, how can I add support for foreign characters?  Thanks
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p964078.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Foreign characters question

Posted by Robert Muir <rc...@gmail.com>.
On Wed, Jul 14, 2010 at 12:59 PM, Blargy <zm...@hotmail.com> wrote:

>
> Nevermind. Apparently my IDE (Netbeans) was set to "No encoding"... wtf.
> Changed it to UTF-8 and recreated the file and all is good now. Thanks!
>
>
fyi I created an issue with your example here:
https://issues.apache.org/jira/browse/SOLR-2003

In this case, the wrong encoding could have been detected and saved you some
time...

-- 
Robert Muir
rcmuir@gmail.com

Re: Foreign characters question

Posted by Blargy <zm...@hotmail.com>.
Nevermind. Apparently my IDE (Netbeans) was set to "No encoding"... wtf.
Changed it to UTF-8 and recreated the file and all is good now. Thanks!
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p967058.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Foreign characters question

Posted by Blargy <zm...@hotmail.com>.
How can I tell and/or create a UTF-8 synonyms file? Do I have to instruct
solr that this file is UTF-8?
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p967037.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Foreign characters question

Posted by Robert Muir <rc...@gmail.com>.
is your synonyms file in UTF-8 encoding?

On Wed, Jul 14, 2010 at 11:11 AM, Blargy <zm...@hotmail.com> wrote:

>
> Thanks for the reply but that didnt help.
>
> Tomcat is accepting foreign characters but for some reason when it reads
> the
> synonyms file and it encounters that character ñ it doesnt appear correctly
> in the Field Analysis admin. It shows up as �. If I query exactly for ñ it
> will work but the synonyms file is srcrewy.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p966740.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Robert Muir
rcmuir@gmail.com

RE: Foreign characters question

Posted by Blargy <zm...@hotmail.com>.
Thanks for the reply but that didnt help. 

Tomcat is accepting foreign characters but for some reason when it reads the
synonyms file and it encounters that character ñ it doesnt appear correctly
in the Field Analysis admin. It shows up as �. If I query exactly for ñ it
will work but the synonyms file is srcrewy.
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p966740.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Foreign characters question

Posted by Tim Gilbert <TI...@morningstar.com>.
I had the same problem, the correction differs by which application server you are using.  

If it's Tomcat, try here:  http://wiki.apache.org/solr/SolrTomcat near uri charset. 

I use glassfish, and I added this entry to the wiki after getting help from this group:  http://wiki.apache.org/solr/SolrGlassfish 

I hope this helps.

Tim

-----Original Message-----
From: Blargy [mailto:zmanods@hotmail.com] 
Sent: Tuesday, July 13, 2010 12:55 PM
To: solr-user@lucene.apache.org
Subject: Foreign characters question


I am trying to add the following synonym while indexing/searching

swimsuit, bañadores, bañador

I testing searching for "bañadores" however it didn't return any results.
After further inspection I noticed in the field analysis admin that swimsuit
gets expanded to ba�adores. Not sure if it will show up but the "n" is a
black diamond with a white question mark in it. 

So basically, how can I add support for foreign characters?  Thanks
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p964078.html
Sent from the Solr - User mailing list archive at Nabble.com.