You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by anonymous user <an...@gmail.com> on 2013/05/17 18:53:57 UTC

Bugs with edismax parser

Hi Solr Users,

I've been migrating our existing app from Solr 3.6.2 to Solr 4.3 and I've
come across some strange behaviour that I think demonstrate one or more
bugs in the edismax parser.

-- Setup
----------------------------------------------------------------------

with a clean copy of solr-4.3.0, navigate to the example folder and start
up the "solr" example and index all of the documents in example/exampledocs:

java -jar start.jar
./start.sh *.xml


-- BUG #1
---------------------------------------------------------------------

Issue the query:

http://localhost:8983/solr/query?q=(
"TFT+matrix"~1000)+OR+(700\:1)&defType=edismax&q.op=AND

you will find 0 results, but you should find the monitor listed in
"monitor.xml"
because of the following line:

<field name="features">30" TFT active matrix LCD, 2560 x 1600, .25mm dot
pitch, 700:1 contrast</field>

if you add &debugQuery=true onto the query, you will see the following:

"rawquerystring":"(\"TFT matrix\"~1000) OR (700\\:1)",
"querystring":"(\"TFT matrix\"~1000) OR (700\\:1)",
"parsedquery":"(+((DisjunctionMaxQuery((text:tft))
DisjunctionMaxQuery((((text:matrix text:1000)~2)))
DisjunctionMaxQuery((text:or)) DisjunctionMaxQuery((((text:700
text:1)~2))))~4))/no_coord",
"parsedquery_toString":"+(((text:tft) (((text:matrix text:1000)~2))
(text:or) (((text:700 text:1)~2)))~4)",

What's really interesting about this query is that the slop operator value
of 1000 becomes a search term in the search. Because the q.op is AND,
nothing is found.

if you no longer escape the colon in 700\:1 and run the query:

http://localhost:8983/solr/query?q=(
"TFT+matrix"~1000)+OR+(700:1)&defType=edismax&q.op=AND

you will see it returns 2 results, and looking at the debugQuery data we
see that the slop operator value of 1000 is no longer treated as a search
term.

"rawquerystring":"(\"TFT matrix\"~1000) OR (700:1)",
"querystring":"(\"TFT matrix\"~1000) OR (700:1)",
"parsedquery":"(+(DisjunctionMaxQuery((text:\"tft matrix\"~1000))
DisjunctionMaxQuery((((text:700 text:1)~2)))))/no_coord",
"parsedquery_toString":"+((text:\"tft matrix\"~1000) (((text:700
text:1)~2)))",

-- BUG #2
---------------------------------------------------------------------

Using the same setup as before, issue the query:

http://localhost:8983/solr/query?q=(
"matrix+TFT"~1000)+OR+("LCD")&defType=edismax&q.op=AND&debugQuery=true

You will get 5 results, as expected:

Now, add a "qf" setting with a non-existant field, such as
&qf=text+doesnotexist onto the query, so you end up with:

http://localhost:8983/solr/query?q=(
"matrix+TFT"~1000)+OR+("LCD")&defType=edismax&q.op=AND&debugQuery=true&qf=text+doesnotexist

you will get zero results. Looking at the debugQuery data we see the
following:

"rawquerystring":"(\"matrix TFT\"~1000) OR (\"LCD\")",
"querystring":"(\"matrix TFT\"~1000) OR (\"LCD\")",
"parsedquery":"(+((DisjunctionMaxQuery((text:matrix))
DisjunctionMaxQuery((((text:tft text:1000)~2)))
DisjunctionMaxQuery((text:or))
DisjunctionMaxQuery((text:lcd)))~4))/no_coord",
"parsedquery_toString":"+(((text:matrix) (((text:tft text:1000)~2))
(text:or) (text:lcd))~4)",

You'll notice that, once again, the slop operator's value of 1000 has
become a search term...

another example of the same issue:

http://localhost:8983/solr/query?q=
"matrix"+OR+"LCD"&defType=edismax&q.op=AND&debugQuery=true&qf=text+doesnotexist

Still no results... Looking at the debugQuery data we see:

"rawquerystring":"\"matrix\" OR \"LCD\"",
"querystring":"\"matrix\" OR \"LCD\"",
"parsedquery":"(+((DisjunctionMaxQuery((text:matrix))
DisjunctionMaxQuery((text:or))
DisjunctionMaxQuery((text:lcd)))~3))/no_coord",
"parsedquery_toString":"+(((text:matrix) (text:or) (text:lcd))~3)",

You'll notice that "or" is being treated as a search term instead of an
operator.

-- Closing
--------------------------------------------------------------------

In both of these cases, the current behaviour is not ideal. Preferably the
parser should throw an exception or fail gracefully in the face of faulty
input.

Thanks,
Anonymous solr user