You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by entdeveloper <ca...@gmail.com> on 2011/12/15 21:25:29 UTC

edismax doesn't obey 'pf' parameter

If I switch back and forth between defType=dismax and defType=edismax, the
edismax doesn't seem to obey my pf parameter. I dug through the code a
little bit and in the ExtendedDismaxQParserPlugin (Solr 3.4/Solr3.5), the
part that is supposed to add the phrase comes here:

Query phrase = pp.parse(userPhraseQuery.toString());

The code in the parse method tries to create a Query against a null field,
and then the phrase does not get added to the mainQuery.

Is this a known bug or am I missing something in my configuration? My config
is very simple:

 <requestHandler class="solr.StandardRequestHandler" name="/search">
    <lst name="defaults">
      <str name="echoParams">explicit</str>
      <str name="defType">edismax</str>
      <str name="qf">name</str>
      <str name="pf">name_exact^2</str>
      <str name="fl">id,name,image_url,url</str>
      <str name="q.alt">*:*</str>
    </lst>
  </requestHandler>


--
View this message in context: http://lucene.472066.n3.nabble.com/edismax-doesn-t-obey-pf-parameter-tp3589763p3589763.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: edismax doesn't obey 'pf' parameter

Posted by Chris Hostetter <ho...@fucit.org>.
: Of course. What I meant to say was there is
: always exactly one token in a non-tokenized
: field and it's offset is always exactly 0. There
: will never be tokens at position 1.
: 
: So asking to match phrases, which is based on
: term positions is basically a no-op.

That's not always true.

consider a situation where you have a multivalued "author_exact" field 
containing the authors full name as a literal string -- either using 
StrField or TextField w/keywordTokenizer; and it's copyFielded from an 
"author" field which is similar but tokenized.

So if a document contains the following two values in the author field...
	"David Smiley"
	"Eric Pugh"

then that document should be matched by all three of these queries...

defType=edismax&q=David&qf=author&pf=author_exact
defType=edismax&q=David+Pugh&qf=author&pf=author_exact
defType=edismax&q=David+Smiley&qf=author&pf=author_exact

...but it should score *really* high for that last query because it not 
only matches on the author field, but it also gets an exact match on the 
entire query string as an implicit phrase in the authr_exact field.

Dismax does behave this way, as you can see using the 3.5 example configs 
& data (note that "cat" is a StrField)...

http://localhost:8983/solr/select/?debugQuery=true&defType=dismax&qf=name^5+features^3&pf=features^2+cat^4&q=hard+drive
<str name="parsedquery">
  +((DisjunctionMaxQuery((features:hard^3.0 | name:hard^5.0)) 
     DisjunctionMaxQuery((features:drive^3.0 | name:drive^5.0))
    )~2) 
   DisjunctionMaxQuery((features:"hard drive"^2.0 | cat:hard drive^4.0))


But for some reason EDismax doesn't behave similarly...

http://localhost:8983/solr/select/?debugQuery=true&defType=edismax&qf=name^5+features^3&pf=features^2+cat^4&q=hard+drive
<str name="parsedquery">
  +((DisjunctionMaxQuery((features:hard^3.0 | name:hard^5.0)) 
     DisjunctionMaxQuery((features:drive^3.0 | name:drive^5.0))
    )~2) 
   DisjunctionMaxQuery((features:"hard drive"^2.0))

...that definitely seems like a bug to me.  but it's not entirely clear 
why it's happening (the pf related code in edismax is kind of hairy)

https://issues.apache.org/jira/browse/SOLR-2988

-Hoss

Re: edismax doesn't obey 'pf' parameter

Posted by Erick Erickson <er...@gmail.com>.
That was a little confusing!

" there's always exactly one token at position 0."

Of course. What I meant to say was there is
always exactly one token in a non-tokenized
field and it's offset is always exactly 0. There
will never be tokens at position 1.

So asking to match phrases, which is based on
term positions is basically a no-op.

Hope that makes more sense
Erick

On Fri, Dec 16, 2011 at 11:44 AM, Erick Erickson
<er...@gmail.com> wrote:
> A side note: specifying qt and defType on the same query is probably
> not what you
> intend. I'd just omit the qt bit since you're essentially passing all
> the info you intend
> explicitly...
>
> I see the same behavior when I specify a non-tokenized field in 3.5
>
> But I don't think this is a bug since it doesn't make sense to specify a phrase
> field on a non-tokenized field since there's always exactly one token
> at position
> 0. The whole idea of phrases is that multiple tokens must appear within
> the slop.
>
> Best
> Erick
>
> On Thu, Dec 15, 2011 at 5:46 PM, entdeveloper
> <ca...@gmail.com> wrote:
>> I'm observing strange results with both the correct and incorrect behavior
>> happening depending on which field I put in the 'pf' param. I wouldn't think
>> this should be analyzer specific, but is it?
>>
>> If I try:
>> http://localhost:8080/solr/collection1/select?qt=%2Fsearch&q=mickey%20mouse&debugQuery=on&defType=edismax&pf=blah_exact&qf=blah
>>
>> It looks correct:
>> <str name="rawquerystring">mickey mouse</str>
>> <str name="querystring">mickey mouse</str>
>> <str name="parsedquery">+((DisjunctionMaxQuery((blah:mickey))
>> DisjunctionMaxQuery((blah:mouse)))~2)
>> DisjunctionMaxQuery((blah_exact:"mickey mouse"))</str>
>> <str name="parsedquery_toString">+(((blah:mickey) (blah:mouse))~2)
>> (blah_exact:"mickey mouse")</str>
>>
>> However, If I put in the field I want, for some reason that phrase portion
>> of the query just completely drops off:
>> http://localhost:8080/solr/collection1/select?qt=%2Fsearch&q=mickey%20mouse&debugQuery=on&defType=edismax&pf=name_exact&qf=name
>>
>> Results:
>> <str name="rawquerystring">mickey mouse</str>
>> <str name="querystring">mickey mouse</str>
>> <str name="parsedquery">+((DisjunctionMaxQuery((name:mickey))
>> DisjunctionMaxQuery((name:mouse)))~2) ()</str>
>> <str name="parsedquery_toString">+(((name:mickey) (name:mouse))~2) ()</str>
>>
>> The name_exact field's analyzer uses KeywordTokenizer, but again, I think
>> this query is being formed too early in the process for that to matter at
>> this point
>>
>> --
>> View this message in context: http://lucene.472066.n3.nabble.com/edismax-doesn-t-obey-pf-parameter-tp3589763p3590153.html
>> Sent from the Solr - User mailing list archive at Nabble.com.

Re: edismax doesn't obey 'pf' parameter

Posted by Erick Erickson <er...@gmail.com>.
A side note: specifying qt and defType on the same query is probably
not what you
intend. I'd just omit the qt bit since you're essentially passing all
the info you intend
explicitly...

I see the same behavior when I specify a non-tokenized field in 3.5

But I don't think this is a bug since it doesn't make sense to specify a phrase
field on a non-tokenized field since there's always exactly one token
at position
0. The whole idea of phrases is that multiple tokens must appear within
the slop.

Best
Erick

On Thu, Dec 15, 2011 at 5:46 PM, entdeveloper
<ca...@gmail.com> wrote:
> I'm observing strange results with both the correct and incorrect behavior
> happening depending on which field I put in the 'pf' param. I wouldn't think
> this should be analyzer specific, but is it?
>
> If I try:
> http://localhost:8080/solr/collection1/select?qt=%2Fsearch&q=mickey%20mouse&debugQuery=on&defType=edismax&pf=blah_exact&qf=blah
>
> It looks correct:
> <str name="rawquerystring">mickey mouse</str>
> <str name="querystring">mickey mouse</str>
> <str name="parsedquery">+((DisjunctionMaxQuery((blah:mickey))
> DisjunctionMaxQuery((blah:mouse)))~2)
> DisjunctionMaxQuery((blah_exact:"mickey mouse"))</str>
> <str name="parsedquery_toString">+(((blah:mickey) (blah:mouse))~2)
> (blah_exact:"mickey mouse")</str>
>
> However, If I put in the field I want, for some reason that phrase portion
> of the query just completely drops off:
> http://localhost:8080/solr/collection1/select?qt=%2Fsearch&q=mickey%20mouse&debugQuery=on&defType=edismax&pf=name_exact&qf=name
>
> Results:
> <str name="rawquerystring">mickey mouse</str>
> <str name="querystring">mickey mouse</str>
> <str name="parsedquery">+((DisjunctionMaxQuery((name:mickey))
> DisjunctionMaxQuery((name:mouse)))~2) ()</str>
> <str name="parsedquery_toString">+(((name:mickey) (name:mouse))~2) ()</str>
>
> The name_exact field's analyzer uses KeywordTokenizer, but again, I think
> this query is being formed too early in the process for that to matter at
> this point
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/edismax-doesn-t-obey-pf-parameter-tp3589763p3590153.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: edismax doesn't obey 'pf' parameter

Posted by entdeveloper <ca...@gmail.com>.
I'm observing strange results with both the correct and incorrect behavior
happening depending on which field I put in the 'pf' param. I wouldn't think
this should be analyzer specific, but is it?

If I try:
http://localhost:8080/solr/collection1/select?qt=%2Fsearch&q=mickey%20mouse&debugQuery=on&defType=edismax&pf=blah_exact&qf=blah

It looks correct:
<str name="rawquerystring">mickey mouse</str>
<str name="querystring">mickey mouse</str>
<str name="parsedquery">+((DisjunctionMaxQuery((blah:mickey))
DisjunctionMaxQuery((blah:mouse)))~2)
DisjunctionMaxQuery((blah_exact:"mickey mouse"))</str>
<str name="parsedquery_toString">+(((blah:mickey) (blah:mouse))~2)
(blah_exact:"mickey mouse")</str>

However, If I put in the field I want, for some reason that phrase portion
of the query just completely drops off:
http://localhost:8080/solr/collection1/select?qt=%2Fsearch&q=mickey%20mouse&debugQuery=on&defType=edismax&pf=name_exact&qf=name

Results:
<str name="rawquerystring">mickey mouse</str>
<str name="querystring">mickey mouse</str>
<str name="parsedquery">+((DisjunctionMaxQuery((name:mickey))
DisjunctionMaxQuery((name:mouse)))~2) ()</str>
<str name="parsedquery_toString">+(((name:mickey) (name:mouse))~2) ()</str>

The name_exact field's analyzer uses KeywordTokenizer, but again, I think
this query is being formed too early in the process for that to matter at
this point

--
View this message in context: http://lucene.472066.n3.nabble.com/edismax-doesn-t-obey-pf-parameter-tp3589763p3590153.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: edismax doesn't obey 'pf' parameter

Posted by Chris Hostetter <ho...@fucit.org>.
: If I switch back and forth between defType=dismax and defType=edismax, the
: edismax doesn't seem to obey my pf parameter. I dug through the code a

I just tried a sample query using Solr 3.5 with the example configs+data.

This is the query i tried...

http://localhost:8983/solr/select/?debugQuery=true&defType=edismax&qf=name^5+features^3&pf=features^4&q=this+document

echoParams shows...

<lst name="params">
  <str name="qf">name^5 features^3</str>
  <str name="pf">features^4</str>
  <str name="debugQuery">true</str>
  <str name="q">this document</str>
  <str name="defType">edismax</str>
</lst>

...it matched the document i expected, and the debug info showed the query 
structure i expected...

<str name="parsedquery">
 +((DisjunctionMaxQuery((features:this^3.0 | name:this^5.0)) 
    DisjunctionMaxQuery((features:document^3.0 | name:document^5.0))
   )~2
  ) 
  DisjunctionMaxQuery((features:"this document"^4.0))
</str>

: Is this a known bug or am I missing something in my configuration? My config
: is very simple:
: 
:  <requestHandler class="solr.StandardRequestHandler" name="/search">
:     <lst name="defaults">
:       <str name="echoParams">explicit</str>
:       <str name="defType">edismax</str>
:       <str name="qf">name</str>
:       <str name="pf">name_exact^2</str>
:       <str name="fl">id,name,image_url,url</str>
:       <str name="q.alt">*:*</str>
:     </lst>
:   </requestHandler>

what did your request look like with that config?  what did the debugQuery 
output look like?


-Hoss