You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by anonymous user <an...@gmail.com> on 2013/05/17 18:53:57 UTC
Bugs with edismax parser
Hi Solr Users,
I've been migrating our existing app from Solr 3.6.2 to Solr 4.3 and I've
come across some strange behaviour that I think demonstrate one or more
bugs in the edismax parser.
-- Setup
----------------------------------------------------------------------
with a clean copy of solr-4.3.0, navigate to the example folder and start
up the "solr" example and index all of the documents in example/exampledocs:
java -jar start.jar
./start.sh *.xml
-- BUG #1
---------------------------------------------------------------------
Issue the query:
http://localhost:8983/solr/query?q=(
"TFT+matrix"~1000)+OR+(700\:1)&defType=edismax&q.op=AND
you will find 0 results, but you should find the monitor listed in
"monitor.xml"
because of the following line:
<field name="features">30" TFT active matrix LCD, 2560 x 1600, .25mm dot
pitch, 700:1 contrast</field>
if you add &debugQuery=true onto the query, you will see the following:
"rawquerystring":"(\"TFT matrix\"~1000) OR (700\\:1)",
"querystring":"(\"TFT matrix\"~1000) OR (700\\:1)",
"parsedquery":"(+((DisjunctionMaxQuery((text:tft))
DisjunctionMaxQuery((((text:matrix text:1000)~2)))
DisjunctionMaxQuery((text:or)) DisjunctionMaxQuery((((text:700
text:1)~2))))~4))/no_coord",
"parsedquery_toString":"+(((text:tft) (((text:matrix text:1000)~2))
(text:or) (((text:700 text:1)~2)))~4)",
What's really interesting about this query is that the slop operator value
of 1000 becomes a search term in the search. Because the q.op is AND,
nothing is found.
if you no longer escape the colon in 700\:1 and run the query:
http://localhost:8983/solr/query?q=(
"TFT+matrix"~1000)+OR+(700:1)&defType=edismax&q.op=AND
you will see it returns 2 results, and looking at the debugQuery data we
see that the slop operator value of 1000 is no longer treated as a search
term.
"rawquerystring":"(\"TFT matrix\"~1000) OR (700:1)",
"querystring":"(\"TFT matrix\"~1000) OR (700:1)",
"parsedquery":"(+(DisjunctionMaxQuery((text:\"tft matrix\"~1000))
DisjunctionMaxQuery((((text:700 text:1)~2)))))/no_coord",
"parsedquery_toString":"+((text:\"tft matrix\"~1000) (((text:700
text:1)~2)))",
-- BUG #2
---------------------------------------------------------------------
Using the same setup as before, issue the query:
http://localhost:8983/solr/query?q=(
"matrix+TFT"~1000)+OR+("LCD")&defType=edismax&q.op=AND&debugQuery=true
You will get 5 results, as expected:
Now, add a "qf" setting with a non-existant field, such as
&qf=text+doesnotexist onto the query, so you end up with:
http://localhost:8983/solr/query?q=(
"matrix+TFT"~1000)+OR+("LCD")&defType=edismax&q.op=AND&debugQuery=true&qf=text+doesnotexist
you will get zero results. Looking at the debugQuery data we see the
following:
"rawquerystring":"(\"matrix TFT\"~1000) OR (\"LCD\")",
"querystring":"(\"matrix TFT\"~1000) OR (\"LCD\")",
"parsedquery":"(+((DisjunctionMaxQuery((text:matrix))
DisjunctionMaxQuery((((text:tft text:1000)~2)))
DisjunctionMaxQuery((text:or))
DisjunctionMaxQuery((text:lcd)))~4))/no_coord",
"parsedquery_toString":"+(((text:matrix) (((text:tft text:1000)~2))
(text:or) (text:lcd))~4)",
You'll notice that, once again, the slop operator's value of 1000 has
become a search term...
another example of the same issue:
http://localhost:8983/solr/query?q=
"matrix"+OR+"LCD"&defType=edismax&q.op=AND&debugQuery=true&qf=text+doesnotexist
Still no results... Looking at the debugQuery data we see:
"rawquerystring":"\"matrix\" OR \"LCD\"",
"querystring":"\"matrix\" OR \"LCD\"",
"parsedquery":"(+((DisjunctionMaxQuery((text:matrix))
DisjunctionMaxQuery((text:or))
DisjunctionMaxQuery((text:lcd)))~3))/no_coord",
"parsedquery_toString":"+(((text:matrix) (text:or) (text:lcd))~3)",
You'll notice that "or" is being treated as a search term instead of an
operator.
-- Closing
--------------------------------------------------------------------
In both of these cases, the current behaviour is not ideal. Preferably the
parser should throw an exception or fail gracefully in the face of faulty
input.
Thanks,
Anonymous solr user