You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by entdeveloper <ca...@gmail.com> on 2011/06/28 04:57:28 UTC

Analyzer creates PhraseQuery

I have an analyzer setup in my schema like so:

  <analyzer>
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.NGramFilterFactory" minGramSize="1"
maxGramSize="2"/>
  </analyzer>

What's happening is if I index a term like "toys and dolls", if I search for
"to", I get no matches. The debug output in solr gives me:

<str name="rawquerystring">to</str>
<str name="querystring">to</str>
<str name="parsedquery">PhraseQuery(autocomplete:"t o to")</str>
<str name="parsedquery_toString">autocomplete:"t o to"</str>

Which means it looks like the lucene query parser is turning it into a
PhraseQuery for some reason. The explain seems to confirm that this
PhraseQuery is what's causing my document to not match:

0.0 = (NON-MATCH) weight(autocomplete:"t o to" in 82), product of:
  1.0 = queryWeight(autocomplete:"t o to"), product of:
    6.684934 = idf(autocomplete: t=60 o=68 to=14)
    0.1495901 = queryNorm
  0.0 = fieldWeight(autocomplete:"t o to" in 82), product of:
    0.0 = tf(phraseFreq=0.0)
    6.684934 = idf(autocomplete: t=60 o=68 to=14)
    0.1875 = fieldNorm(field=autocomplete, doc=82)

But why? This seems like it should match to me, and indeed the Solr analysis
tool highlights the matches (see image), so something isn't lining up right.

http://lucene.472066.n3.nabble.com/file/n3116288/Screen_shot_2011-06-27_at_7.55.49_PM.png 

In case you're wondering, I'm trying to implement a semi-advanced
autocomplete feature that goes beyond using what a simple EdgeNGram analyzer
could do.


--
View this message in context: http://lucene.472066.n3.nabble.com/Analyzer-creates-PhraseQuery-tp3116288p3116288.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Analyzer creates PhraseQuery

Posted by entdeveloper <ca...@gmail.com>.

Thanks guys. Both the PositionFilterFactory and the
autoGeneratePhraseQueries=false solutions solved the issue.

--
View this message in context: http://lucene.472066.n3.nabble.com/Analyzer-creates-PhraseQuery-tp3116288p3118471.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Analyzer creates PhraseQuery

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.

(11/06/28 16:40), lboutros wrote:
> You could add this filter after the NGram filter to prevent the phrase query
> creation :
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory
>
> Ludovic.

There is an option to avoid producing phrase queries, autoGeneratePhraseQueries=false.

koji
-- 
http://www.rondhuit.com/en/

Re: Analyzer creates PhraseQuery

Posted by lboutros <bo...@gmail.com>.

You could add this filter after the NGram filter to prevent the phrase query
creation :

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory

Ludovic.

-----
Jouve
France.
--
View this message in context: http://lucene.472066.n3.nabble.com/Analyzer-creates-PhraseQuery-tp3116288p3116885.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Analyzer creates PhraseQuery

Posted by Sujatha Arun <su...@gmail.com>.

Separate the Analyzer into a index time analyzer with NgramFilter Factory
and Query time analyzer without the N-gram Filter Factory

Since your query is analyzed  by this analyzer and  produces  more than one
tokens for the given keyoword and hence the result is phrase query.

Regards
Sujatha

On Tue, Jun 28, 2011 at 11:09 AM, Mohammad Shariq <sh...@gmail.com>wrote:

> I guess 'to' may be listed in 'stopWords' .
>
> On 28 June 2011 08:27, entdeveloper <ca...@gmail.com> wrote:
>
> > I have an analyzer setup in my schema like so:
> >
> >  <analyzer>
> >    <tokenizer class="solr.KeywordTokenizerFactory"/>
> >    <filter class="solr.LowerCaseFilterFactory"/>
> >    <filter class="solr.NGramFilterFactory" minGramSize="1"
> > maxGramSize="2"/>
> >  </analyzer>
> >
> > What's happening is if I index a term like "toys and dolls", if I search
> > for
> > "to", I get no matches. The debug output in solr gives me:
> >
> > <str name="rawquerystring">to</str>
> > <str name="querystring">to</str>
> > <str name="parsedquery">PhraseQuery(autocomplete:"t o to")</str>
> > <str name="parsedquery_toString">autocomplete:"t o to"</str>
> >
> > Which means it looks like the lucene query parser is turning it into a
> > PhraseQuery for some reason. The explain seems to confirm that this
> > PhraseQuery is what's causing my document to not match:
> >
> > 0.0 = (NON-MATCH) weight(autocomplete:"t o to" in 82), product of:
> >  1.0 = queryWeight(autocomplete:"t o to"), product of:
> >    6.684934 = idf(autocomplete: t=60 o=68 to=14)
> >    0.1495901 = queryNorm
> >  0.0 = fieldWeight(autocomplete:"t o to" in 82), product of:
> >    0.0 = tf(phraseFreq=0.0)
> >    6.684934 = idf(autocomplete: t=60 o=68 to=14)
> >    0.1875 = fieldNorm(field=autocomplete, doc=82)
> >
> > But why? This seems like it should match to me, and indeed the Solr
> > analysis
> > tool highlights the matches (see image), so something isn't lining up
> > right.
> >
> >
> >
> http://lucene.472066.n3.nabble.com/file/n3116288/Screen_shot_2011-06-27_at_7.55.49_PM.png
> >
> > In case you're wondering, I'm trying to implement a semi-advanced
> > autocomplete feature that goes beyond using what a simple EdgeNGram
> > analyzer
> > could do.
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Analyzer-creates-PhraseQuery-tp3116288p3116288.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
>
> --
> Thanks and Regards
> Mohammad Shariq
>

Re: Analyzer creates PhraseQuery

Posted by Mohammad Shariq <sh...@gmail.com>.

I guess 'to' may be listed in 'stopWords' .

On 28 June 2011 08:27, entdeveloper <ca...@gmail.com> wrote:

> I have an analyzer setup in my schema like so:
>
>  <analyzer>
>    <tokenizer class="solr.KeywordTokenizerFactory"/>
>    <filter class="solr.LowerCaseFilterFactory"/>
>    <filter class="solr.NGramFilterFactory" minGramSize="1"
> maxGramSize="2"/>
>  </analyzer>
>
> What's happening is if I index a term like "toys and dolls", if I search
> for
> "to", I get no matches. The debug output in solr gives me:
>
> <str name="rawquerystring">to</str>
> <str name="querystring">to</str>
> <str name="parsedquery">PhraseQuery(autocomplete:"t o to")</str>
> <str name="parsedquery_toString">autocomplete:"t o to"</str>
>
> Which means it looks like the lucene query parser is turning it into a
> PhraseQuery for some reason. The explain seems to confirm that this
> PhraseQuery is what's causing my document to not match:
>
> 0.0 = (NON-MATCH) weight(autocomplete:"t o to" in 82), product of:
>  1.0 = queryWeight(autocomplete:"t o to"), product of:
>    6.684934 = idf(autocomplete: t=60 o=68 to=14)
>    0.1495901 = queryNorm
>  0.0 = fieldWeight(autocomplete:"t o to" in 82), product of:
>    0.0 = tf(phraseFreq=0.0)
>    6.684934 = idf(autocomplete: t=60 o=68 to=14)
>    0.1875 = fieldNorm(field=autocomplete, doc=82)
>
> But why? This seems like it should match to me, and indeed the Solr
> analysis
> tool highlights the matches (see image), so something isn't lining up
> right.
>
>
> http://lucene.472066.n3.nabble.com/file/n3116288/Screen_shot_2011-06-27_at_7.55.49_PM.png
>
> In case you're wondering, I'm trying to implement a semi-advanced
> autocomplete feature that goes beyond using what a simple EdgeNGram
> analyzer
> could do.
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Analyzer-creates-PhraseQuery-tp3116288p3116288.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks and Regards
Mohammad Shariq