You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Scott Leonard <sc...@cnet.com> on 2007/01/30 00:20:36 UTC

Querying international characters

I have a mirror of the entire dmoz content in a solr index. International
characters seem to be loaded and returned in queries just fine but queries
that _contain_ international character queries return no results for known
matching patterns.

Is there a filter class I need to be using for international character
support?  Any other gotchas in supporting these characters in solr?

.scott


Re: Querying international characters

Posted by Chris Hostetter <ho...@fucit.org>.
: I have a mirror of the entire dmoz content in a solr index. International
: characters seem to be loaded and returned in queries just fine but queries
: that _contain_ international character queries return no results for known
: matching patterns.
:
: Is there a filter class I need to be using for international character
: support?  Any other gotchas in supporting these characters in solr?

there are a couple of things that might be going on...

1) at the moment, solr only really plays nicely with UTF-8 ... so if you
are dealing with another charset, that may be the orrigin of the issue...

2) the HTTP requests you are sending may not be encoding the characters
properly in the request ... what does your query URL look like?

Using the example schema, and searching for "LATIN SMALL LETTER E WITH
ACUTE" my URL looks like this...

http://localhost:8983/solr/select/?q=%C3%A9&version=2.2&start=0&rows=10&indent=on

and correctly finds the doc with id UTF8TEST

3) you may be using an Analyzer/TokenFilter that is striping/replacing
your characters during analysis, try using /solr/admin/analysis.jsp to see
what is getting indexed in each field when you put in your international
characters and what tokens your query time analyzer produces for your
input.


-Hoss


Re: SV: Querying international characters

Posted by Scott Leonard <sc...@cnet.com>.
i'm actually using resin here.


On 1/29/07 3:33 PM, "Antonio Eggberg" <an...@yahoo.se> wrote:

> Hi :
> 
> If you haven't done so.. I think you need to enable UTF-8 support in your
> tomcat/jetty etc.. for quries from web browsers.. have a look
> 
> http://wiki.apache.org/tomcat/Tomcat/UTF-8
> 
> Regards
> 
> Scott Leonard <sc...@cnet.com> skrev: I have a mirror of the entire
> dmoz content in a solr index. International
> characters seem to be loaded and returned in queries just fine but queries
> that _contain_ international character queries return no results for known
> matching patterns.
> 
> Is there a filter class I need to be using for international character
> support?  Any other gotchas in supporting these characters in solr?
> 
> .scott
> 
> 
> 
> 
> ---------------------------------
> 
> Stava rätt! Stava lätt! Yahoo! Mails stavkontroll tar hand om tryckfelen och
> mycket mer! Få den på http://se.mail.yahoo.com


SV: Querying international characters

Posted by Antonio Eggberg <an...@yahoo.se>.
Hi :

If you haven't done so.. I think you need to enable UTF-8 support in your tomcat/jetty etc.. for quries from web browsers.. have a look 

http://wiki.apache.org/tomcat/Tomcat/UTF-8

Regards

Scott Leonard <sc...@cnet.com> skrev: I have a mirror of the entire dmoz content in a solr index. International
characters seem to be loaded and returned in queries just fine but queries
that _contain_ international character queries return no results for known
matching patterns.

Is there a filter class I need to be using for international character
support?  Any other gotchas in supporting these characters in solr?

.scott



 		
---------------------------------

Stava rätt! Stava lätt! Yahoo! Mails stavkontroll tar hand om tryckfelen och mycket mer! Få den på http://se.mail.yahoo.com