You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Simon Wistow <si...@thegestalt.org> on 2009/11/10 01:06:06 UTC

Oddness with Phrase Query

I have a document with the title "Here, there be dragons" and a body.

When I search for 

q  = Here, there be dragons
qf = title^2.0 body^0.8
qt = dismax

Which is parsed as 

+DisjunctionMaxQuery((content:"here dragon"^0.8 | title:"here 
dragon"^2.0)~0.01) ()

I get the document as the first hit which is what I'd suspect.

However, if change the query to 

q  = "Here, there be dragons"

(with quotes)

which is parsed as

+DisjunctionMaxQuery((content:"here dragon"^0.8 | title:"here 
dragon"^2.0)~0.01) ()

then I don't get the document at all. Which is not what I'd suspect.

I've tried modifying the phrase slop but still don't get any results 
back.

Am I doing something wrong - do I have to have an untokenized copy of 
fields lying around?

Thanks,

Simon

Re: Oddness with Phrase Query

Posted by Simon Wistow <si...@thegestalt.org>.

On Mon, Nov 23, 2009 at 12:10:42PM -0800, Chris Hostetter said:
> ...hmm, you shouldn't have to reindex everything.  arey ou sure you 
> restarted solr after making the enablePositionIncrements="true" change to 
> the query analyzer?

Yup - definitely restarted
 
> what do the offsets look like when you go to analysis.jsp and past in that 
> sentence?

org.apache.solr.analysis.StopFilterFactory 
{words=stopwords.txt, ignoreCase=true, enablePositionIncrements=true}

term position:  	1	    4 
term text: 	        Here	Dragons
term type: 	        word	word
source start,end 	0,4	    14,21
payload 		


> the other thing to consider: you can increase the slop value on that
> phrase query (to allow looser matching) using the "qs" param (query slop) 
> ... that could help in this situation (stop words getting striped out of 
> hte query) as well as other situations (ie: what if the user just types 
> "here be dragons" -- with or without stop words)

After fiddling with the position incremements stuff I upped the query 
slop to 2 which seems to now provide better results but I'm worried 
about that effecting relevancy elsewhere (which I presume is the reason 
why it's not the default value).

If that's the case - is it worth writing something for my app so that if 
it detects a phrase query with lots of stop words it ups the phrase 
slop?

Either way it seems to be working now  - thanks for all the help,

Simon

Re: Oddness with Phrase Query

Posted by Chris Hostetter <ho...@fucit.org>.

: ?q="Here there be dragons"
: &qt=dismax
: &qf=title
	...
: +DisjunctionMaxQuery((title:"here dragon")~0.01) ()

...the quotes cause the entire string to be passed to the analyzer for 
the title field and the resulting Tokens are used to construct a phrase 
query.

: ?q=Here there be dragons
: &qt=dismax
: &qf=title
	...
: +((DisjunctionMaxQuery((title:here)~0.01) 
: DisjunctionMaxQuery((title:dragon)~0.01))~2) ()

...the lack of quotes just results in two term queries, that must be 
anywhere in the string.

: It looks like it might be related to 
	...
: http://issues.apache.org/jira/browse/SOLR-879
: 
: Although I added enablePositionIncrements="true" to
: 
: <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
: 
: in to the <analyzer type="query"> for <fieldType name="text"> in the 
: schema which didn't fix it - I presume this means that I have to reindex 
: everything (although the StopFilterFactory in <analyzer type="index"> 
: already had it).

...hmm, you shouldn't have to reindex everything.  arey ou sure you 
restarted solr after making the enablePositionIncrements="true" change to 
the query analyzer?

what do the offsets look like when you go to analysis.jsp and past in that 
sentence?

the other thing to consider: you can increase the slop value on that
phrase query (to allow looser matching) using the "qs" param (query slop) 
... that could help in this situation (stop words getting striped out of 
hte query) as well as other situations (ie: what if the user just types 
"here be dragons" -- with or without stop words)



-Hoss

Re: Oddness with Phrase Query

Posted by Simon Wistow <si...@thegestalt.org>.

On Tue, Nov 17, 2009 at 11:09:38AM -0800, Chris Hostetter said:
> 
> Several things about your message don't make sense...

Hmm, sorry - a byproduct of building up the mail over time I think.

The query 

?q="Here there be dragons"
&fl=id,title,score
&debugQuery=on
&qt=dismax
&qf=title

gets echoed as 

<lst name="params">
 <str name="qf">title</str>
 <str name="fl">id,title,score</str>
 <str name="debugQuery">on</str>
 <str name="q">"Here there be dragons"</str>
 <str name="qt">dismax</str>
</lst>

and gets parsed as 

+DisjunctionMaxQuery((title:"here dragon")~0.01) ()

and gets no results.


Whereas 

?q=Here there be dragons
&fl=id,title,score
&debugQuery=on
&qt=dismax
&qf=title

gets echoed as 

<lst name="params">
<str name="debugQuery">on</str>
<str name="fl">id,title,score</str>
<str name="q">Here, there be dragons</str>
<str name="qf">title</str>
<str name="qt">dismax</str>
</lst>

and parsed as 

+((DisjunctionMaxQuery((title:here)~0.01) 
DisjunctionMaxQuery((title:dragon)~0.01))~2) ()

Gets one result

<doc>
<float name="score">6.3863463</float>
<str name="id">20980889</str>
<str name="title">Zelazny, Roger - Here There Be Dragons</str>
</doc>


It looks like it might be related to 

SOLR-879: Enable position increments in the query parser and fix the
          example schema to enable position increments for the stop 
          filter in both the index and query analyzers to fix the bug 
          with phrase queries with stopwords. (yonik)

http://issues.apache.org/jira/browse/SOLR-879

Although I added enablePositionIncrements="true" to

<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

in to the <analyzer type="query"> for <fieldType name="text"> in the 
schema which didn't fix it - I presume this means that I have to reindex 
everything (although the StopFilterFactory in <analyzer type="index"> 
already had it).

Re: Oddness with Phrase Query

Posted by Chris Hostetter <ho...@fucit.org>.

Several things about your message don't make sense...

1) the field names listed in your "qf" don't match up to the field names 
in the generated query.toString() ... suggesting that they come from 
differnet examples

2) the query.toString() output from each of your queries are identicle, 
and yet you claim both the input q strings and the results were different.

3) the query.toString() you provided for the 'unquoted" example doesn't 
make sense -- there's no reason the dismax parser would have generated a 
query structure like that unless your query string was allready quoted.

...all of this suggests major disconnect from what you're actually tried 
and what you've pasted in your email ... probably just the result of 
trying different things and getting confused about the results and then 
mixxing them up when asking your question.

can you try testing the various options again, and please include the 
actaul, raw, urls along with the cut/past output from debugQuery (we 
probably don't need to the explain section)

It would also be helpful to see the schema.xml section for the 
fields/fieldtypes you are using.


: When I search for 
: 
: q  = Here, there be dragons
: qf = title^2.0 body^0.8
: qt = dismax
: 
: Which is parsed as 
: 
: +DisjunctionMaxQuery((content:"here dragon"^0.8 | title:"here 
: dragon"^2.0)~0.01) ()
: 
: I get the document as the first hit which is what I'd suspect.
: 
: However, if change the query to 
: 
: q  = "Here, there be dragons"
: 
: (with quotes)
: 
: which is parsed as
: 
: +DisjunctionMaxQuery((content:"here dragon"^0.8 | title:"here 
: dragon"^2.0)~0.01) ()
: 
: then I don't get the document at all. Which is not what I'd suspect.



-Hoss