You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pradeep Pujari <Pr...@rocketmail.com> on 2011/05/21 21:30:30 UTC

chinese SOLR query parser

Hi,

I made changes to schema.xml with CJKAnalyzer. Does naything else required to change in solrconfig.xml for query parser component. Because, I do not get any result back while searching? Looks like the chinese characters are being encoded unable to match in the index. Any help is highly appriciated.

Thanks
Pradeep.

Re: chinese SOLR query parser

Posted by Michael McCandless <lu...@mikemccandless.com>.
Not that I know of... and I'm no expert on it!!  I know there are at
least two possibilities -- ChineseAnalyzer / CJKAnalyzer (from trunk's
modules/analysis), but I don't know the tradeoffs of each.

Hopefully others will chime in here?

However, once you do figure out a good schema, could you please post
back?  I'd like to add it to Solr's example schema as an example field
type (text_example_zh?).

Mike

http://blog.mikemccandless.com

On Sat, May 21, 2011 at 7:20 PM, Andy <an...@yahoo.com> wrote:
> Is there any example schema for Chinese that I could use as a guide right now?
>
> Thanks
>
>
> --- On Sat, 5/21/11, Michael McCandless <lu...@mikemccandless.com> wrote:
>
>> From: Michael McCandless <lu...@mikemccandless.com>
>> Subject: Re: chinese SOLR query parser
>> To: solr-user@lucene.apache.org
>> Date: Saturday, May 21, 2011, 6:14 PM
>> Unfortunately, Solr's defaults
>> (example schema) are unusable for
>> non-whitespace languages... see:
>>
>>     http://markmail.org/thread/ww6mhfi3rfpngmc5
>>
>> So it could be you need to turn off
>> autoGeneratePhraseQueries in your
>> fieldType?  We are working towards fixing the example
>> schema (for
>> 3.2/4.0) in https://issues.apache.org/jira/browse/SOLR-2519 ...
>>
>> Also, it could be your web/app server is not using UTF8
>> character
>> encoding, eg Tomcat defaults to ISO-8859-1 -- see
>> http://wiki.apache.org/tomcat/FAQ/CharacterEncoding
>>
>> Mike
>>
>> http://blog.mikemccandless.com
>>
>> On Sat, May 21, 2011 at 3:30 PM, Pradeep Pujari <Pr...@rocketmail.com>
>> wrote:
>> > Hi,
>> >
>> > I made changes to schema.xml with CJKAnalyzer. Does
>> naything else required to change in solrconfig.xml for query
>> parser component. Because, I do not get any result back
>> while searching? Looks like the chinese characters are being
>> encoded unable to match in the index. Any help is highly
>> appriciated.
>> >
>> > Thanks
>> > Pradeep.
>> >
>>
>

Re: chinese SOLR query parser

Posted by Andy <an...@yahoo.com>.
Is there any example schema for Chinese that I could use as a guide right now?

Thanks


--- On Sat, 5/21/11, Michael McCandless <lu...@mikemccandless.com> wrote:

> From: Michael McCandless <lu...@mikemccandless.com>
> Subject: Re: chinese SOLR query parser
> To: solr-user@lucene.apache.org
> Date: Saturday, May 21, 2011, 6:14 PM
> Unfortunately, Solr's defaults
> (example schema) are unusable for
> non-whitespace languages... see:
> 
>     http://markmail.org/thread/ww6mhfi3rfpngmc5
> 
> So it could be you need to turn off
> autoGeneratePhraseQueries in your
> fieldType?  We are working towards fixing the example
> schema (for
> 3.2/4.0) in https://issues.apache.org/jira/browse/SOLR-2519 ...
> 
> Also, it could be your web/app server is not using UTF8
> character
> encoding, eg Tomcat defaults to ISO-8859-1 -- see
> http://wiki.apache.org/tomcat/FAQ/CharacterEncoding
> 
> Mike
> 
> http://blog.mikemccandless.com
> 
> On Sat, May 21, 2011 at 3:30 PM, Pradeep Pujari <Pr...@rocketmail.com>
> wrote:
> > Hi,
> >
> > I made changes to schema.xml with CJKAnalyzer. Does
> naything else required to change in solrconfig.xml for query
> parser component. Because, I do not get any result back
> while searching? Looks like the chinese characters are being
> encoded unable to match in the index. Any help is highly
> appriciated.
> >
> > Thanks
> > Pradeep.
> >
> 

Re: chinese SOLR query parser

Posted by Michael McCandless <lu...@mikemccandless.com>.
Unfortunately, Solr's defaults (example schema) are unusable for
non-whitespace languages... see:

    http://markmail.org/thread/ww6mhfi3rfpngmc5

So it could be you need to turn off autoGeneratePhraseQueries in your
fieldType?  We are working towards fixing the example schema (for
3.2/4.0) in https://issues.apache.org/jira/browse/SOLR-2519 ...

Also, it could be your web/app server is not using UTF8 character
encoding, eg Tomcat defaults to ISO-8859-1 -- see
http://wiki.apache.org/tomcat/FAQ/CharacterEncoding

Mike

http://blog.mikemccandless.com

On Sat, May 21, 2011 at 3:30 PM, Pradeep Pujari <Pr...@rocketmail.com> wrote:
> Hi,
>
> I made changes to schema.xml with CJKAnalyzer. Does naything else required to change in solrconfig.xml for query parser component. Because, I do not get any result back while searching? Looks like the chinese characters are being encoded unable to match in the index. Any help is highly appriciated.
>
> Thanks
> Pradeep.
>