You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Teruhiko Kurosaka <Ku...@basistech.com> on 2010/03/17 01:31:00 UTC

Solr query parser doesn't invoke analyzer for simple term query?

It seems that Solr's query parser doesn't pass a single term query
to the Analyzer for the field. For example, if I give it
2001年 (year 2001 in Japanese), the searcher returns 0 hits 
but if I quote them with double-quotes, it returns hits. 
In this experiment, I configured schema.xml so that
the field in question will use the morphological Analyzer 
my company makes that is capable of splitting 2001年  
into two tokens 2001 and 年.  I am guessing that this
Analyzer is called ONLY IF the term is a phrase.
Is my observation correct?

If so, is there any configuration parameter that I can tweak 
to force any query for the text fields be processed by 
the Analyzer?

One might ask why users won't put space between 2001 and 年.
Well if they are clearly two separate words, people do that.
But 年 works more like a suffix in this case, and in many
Japanese speaker's mind, 2001年 seems like one token, so
many people won't.  (Remember Japanese don't use spaces
in normal writing.)  Forcing to use Analyzer would also
be useful for compound word handling often desirable
for languages like German.

----
Teruhiko "Kuro" Kurosaka
RLP + Lucene & Solr = powerful search for global contents

Re: Solr query parser doesn't invoke analyzer for simple term query?

Posted by Chris Hostetter <ho...@fucit.org>.

: It seems that Solr's query parser doesn't pass a single term query

no ... the query parser always uses the analyzer for "text" regardless of 
wether it's a single term or not (it doesnt' even know if it's a single 
term until the Analyzer tells it)

cases where the analyzer isn't used are things like range queries, or 
wildcards, or prefix queries.


-Hoss

Re: Solr query parser doesn't invoke analyzer for simple term query?

Posted by Chris Hostetter <ho...@fucit.org>.

: 
: Thank you, Marco.  I see the debug out put that looks like:
: <str name="rawquerystring">title_jpn:2001年</str>
: <str name="querystring">title_jpn:2001年</str>
: <str name="parsedquery">PhraseQuery(title_jpn:"2001 年")</str>
: <str name="parsedquery_toString">title_jpn:"2001 年"</str>
	...
: Does this mean the standard query parser does send the
: raw query string to the Analyzer and (because the query
: yielded more than one token?) it uses phrase query?

correct.

: I guess the cause of my problem is somewhere else.

what does the debug output look like when you "quote" the input (you 
mentioned before that you got differnet results when using/ommiting 
quotes)



-Hoss

Re: Solr query parser doesn't invoke analyzer for simple term query?

Posted by Teruhiko Kurosaka <Ku...@basistech.com>.

Thank you, Marco.  I see the debug out put that looks like:
<str name="rawquerystring">title_jpn:2001年</str>
<str name="querystring">title_jpn:2001年</str>
<str name="parsedquery">PhraseQuery(title_jpn:"2001 年")</str>
<str name="parsedquery_toString">title_jpn:"2001 年"</str>
<lst name="explain"/>
<str name="QParser">LuceneQParser</str>

Does this mean the standard query parser does send the
raw query string to the Analyzer and (because the query
yielded more than one token?) it uses phrase query?

I guess the cause of my problem is somewhere else.


On Mar 17, 2010, at 1:05 AM, Marco Martinez wrote:

> Hello,
> 
> You can see what happen (which analyzer are used for this field and which is
> the output of the analyzers) with this search using the analysis page of the
> solr default web page. I assume you are using the same analyzers and
> tokenizers in indexing and searching for this field in your schema.
> 
> Regards,
> 
> 
> Marco Martínez Bautista
> 
> 
> 
> 2010/3/17 Teruhiko Kurosaka <Ku...@basistech.com>
> 
>> It seems that Solr's query parser doesn't pass a single term query
>> to the Analyzer for the field. For example, if I give it
>> 2001年 (year 2001 in Japanese), the searcher returns 0 hits
>> but if I quote them with double-quotes, it returns hits.
>> In this experiment, I configured schema.xml so that
>> the field in question will use the morphological Analyzer
>> my company makes that is capable of splitting 2001年
>> into two tokens 2001 and 年.  I am guessing that this
>> Analyzer is called ONLY IF the term is a phrase.
>> Is my observation correct?
>> 
>> If so, is there any configuration parameter that I can tweak
>> to force any query for the text fields be processed by
>> the Analyzer?
>> 
>> One might ask why users won't put space between 2001 and 年.
>> Well if they are clearly two separate words, people do that.
>> But 年 works more like a suffix in this case, and in many
>> Japanese speaker's mind, 2001年 seems like one token, so
>> many people won't.  (Remember Japanese don't use spaces
>> in normal writing.)  Forcing to use Analyzer would also
>> be useful for compound word handling often desirable
>> for languages like German.

----
Teruhiko "Kuro" Kurosaka
RLP + Lucene & Solr = powerful search for global contents

Re: Solr query parser doesn't invoke analyzer for simple term query?

Posted by Marco Martinez <mm...@paradigmatecnologico.com>.

Hello,

You can see what happen (which analyzer are used for this field and which is
the output of the analyzers) with this search using the analysis page of the
solr default web page. I assume you are using the same analyzers and
tokenizers in indexing and searching for this field in your schema.

Regards,


Marco Martínez Bautista



2010/3/17 Teruhiko Kurosaka <Ku...@basistech.com>

> It seems that Solr's query parser doesn't pass a single term query
> to the Analyzer for the field. For example, if I give it
> 2001年 (year 2001 in Japanese), the searcher returns 0 hits
> but if I quote them with double-quotes, it returns hits.
> In this experiment, I configured schema.xml so that
> the field in question will use the morphological Analyzer
> my company makes that is capable of splitting 2001年
> into two tokens 2001 and 年.  I am guessing that this
> Analyzer is called ONLY IF the term is a phrase.
> Is my observation correct?
>
> If so, is there any configuration parameter that I can tweak
> to force any query for the text fields be processed by
> the Analyzer?
>
> One might ask why users won't put space between 2001 and 年.
> Well if they are clearly two separate words, people do that.
> But 年 works more like a suffix in this case, and in many
> Japanese speaker's mind, 2001年 seems like one token, so
> many people won't.  (Remember Japanese don't use spaces
> in normal writing.)  Forcing to use Analyzer would also
> be useful for compound word handling often desirable
> for languages like German.
>
> ----
> Teruhiko "Kuro" Kurosaka
> RLP + Lucene & Solr = powerful search for global contents
>
>