You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jonathan Rochkind <ro...@jhu.edu> on 2011/03/16 22:29:08 UTC

dismax parser, parens, what do they do exactly

It looks like Dismax query parser can somehow handle parens, used for
applying, for instance, + or - to a group, distributing it. But I'm not
sure what effect they have on the overall query.

For instance, if I give dismax this:

book (dog +( cat -frog))

debugQuery shows:

+((DisjunctionMaxQuery((text:book)~0.01)
+DisjunctionMaxQuery((text:dog)~0.01)
DisjunctionMaxQuery((text:cat)~0.01)
-DisjunctionMaxQuery((text:frog)~0.01))~2) ()


How will that be treated by mm?  Let's say I have an mm of 50%.  Does
that apply to the "top-level", like either "book" needs to match or
"+(dog +( cat -frog))" needs to match?  And for "+(dog +( cat -frog))"
to match, do just 50% of that subquery need to match... or is mm ignored
there?  Or something else entirely?

Can anyone clear this up?  Continuing to try experimentally to clear it up... it _looks_ like the mm actually applies to each _individual_ low-level query.  So even though the semantics of:
book (dog +( cat -frog))

are respected, if mm is 50%, the nesting is irrelvant, exactly 50% of "book", "dog", "+cat", and "+-frog" (distributing the operators through I guess?) are required. I think. I'm getting confused even talking about it.




Re: dismax parser, parens, what do they do exactly

Posted by Jonathan Rochkind <ro...@jhu.edu>.
Thanks Hoss, this is very helpful, okay, dismax is not intended to do 
anything with parens for semantics, they're just like any other char, 
handled by analyzers.

I think you're right I cut and paste the wrong query before. Just for 
the record, on 1.4.1:

qf=text
pf=
q=book (dog +(cat -frog))

<str name="parsedquery">
+((DisjunctionMaxQuery((text:book)~0.01) 
DisjunctionMaxQuery((text:dog)~0.01) 
DisjunctionMaxQuery((text:cat)~0.01) 
-DisjunctionMaxQuery((text:frog)~0.01))~3) ()
</str>

<str name="parsedquery_toString">
+(((text:book)~0.01 (text:dog)~0.01 (text:cat)~0.01 -(text:frog)~0.01)~3) ()
</str>



Re: dismax parser, parens, what do they do exactly

Posted by Chris Hostetter <ho...@fucit.org>.
: It looks like Dismax query parser can somehow handle parens, used for
: applying, for instance, + or - to a group, distributing it. But I'm not
: sure what effect they have on the overall query.

parens are treated like any regular character -- they have no semantic 
meaning.

what may be confusing you is what the *analyzer* you have configured for 
your query field then does with the paren.

For instance, using the example schema on trunk, try the same query...

	q = book (dog +(cat -frog))

..but using a "qf" param containing a string field (no analysis) ...

/select?defType=dismax&q=book+(dog+%2B(cat+-frog))&tie=0.01&qf=text_s&debugQuery=true

It produces the following output (i've added some whitespace)...

<str name="parsedquery">
 +(  DisjunctionMaxQuery((  text_s:book    )~0.01) 
     DisjunctionMaxQuery((  text_s:(dog    )~0.01) 
    +DisjunctionMaxQuery((  text_s:(cat    )~0.01) 
    -DisjunctionMaxQuery((  text_s:frog))  )~0.01)
  ) 
  ()
</str>

...the parens from your query are being treated literally as characters in 
your terms.  you just don't see them in the parsed query because it shows 
you what those terms look like after the anlsysis.

Incidently...

: debugQuery shows:
: 
: +((DisjunctionMaxQuery((text:book)~0.01)
: +DisjunctionMaxQuery((text:dog)~0.01)
: DisjunctionMaxQuery((text:cat)~0.01)
: -DisjunctionMaxQuery((text:frog)~0.01))~2) ()

...double check that, it doesn't seem to match the query string you posted 
(it shows "dog" being mandatory.  i'm guessing you cut/paste the wrong 
example)


-Hoss