You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jan Høydahl (JIRA)" <ji...@apache.org> on 2013/02/25 23:24:13 UTC

[jira] [Comment Edited] (SOLR-4480) EDisMax parser blows up with query containing single plus or minus

    [ https://issues.apache.org/jira/browse/SOLR-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586368#comment-13586368 ] 

Jan Høydahl edited comment on SOLR-4480 at 2/25/13 10:22 PM:
-------------------------------------------------------------

So let's take the String field example. A single %2B crashes the Lucene query parser, and since we just pass it straight through it crashes eDisMax too.

For the Lucene parser, it crashes for all query strings *ending* in a single "+"
http://localhost:8983/solr/select?debug=query&q=foo%20%2B
but not for queries where there is a whitespace after the "+"
http://localhost:8983/solr/select?debug=query&q=%2B%20foo

eDismax is a bit different. It does not crash on ending "+" but it swallows it:
http://localhost:8983/solr/select?debug=query&defType=edismax&df=foo_s&q=%2B%20hello%20%2B

This is probably due to line 700-703 being too quick at guessing that the + or - means MUST or NOT
{code}
      if (ch=='+' || ch=='-') {
        clause.must = ch;
        pos++;
      }
{code}

I'm ok with saying that a single plus or minus should mean literal matching (given that field type supports it), and thus we add escaping. But then we should do the same at the end of a query string.
                
      was (Author: janhoy):
    So let's take the String field example. A single %2B crashes the Lucene query parser, and since we just pass it straight through it crashes eDisMax too.

For the Lucene parser, it crashes for all query strings *ending* in a single "+"
http://localhost:8983/solr/select?debug=query&q=foo%20%2B
but not for queries where there is a whitespace after the "+"
http://localhost:8983/solr/select?debug=query&q=%2B%20foo

eDismax is a bit different. It does not crash on ending "+" but it swallows it:
http://localhost:8983/solr/select?debug=query&defType=edismax&df=foo_s&q=%2B%20hello%20%2B

This is due to line 700-703 being too quick at guessing that the + or - means MUST or NOT
{code}
      if (ch=='+' || ch=='-') {
        clause.must = ch;
        pos++;
      }
{code}

I'm ok with saying that a single "+" or "-" should mean literal matching (given that field type supports it), and thus we translate '+'->'\+'. But then we should do the same for the "+" or "-" at the end of a query string.
                  
> EDisMax parser blows up with query containing single plus or minus
> ------------------------------------------------------------------
>
>                 Key: SOLR-4480
>                 URL: https://issues.apache.org/jira/browse/SOLR-4480
>             Project: Solr
>          Issue Type: Bug
>          Components: query parsers
>            Reporter: Fiona Tay
>            Priority: Critical
>             Fix For: 4.2, 5.0
>
>         Attachments: SOLR-4480.patch, SOLR-4480.patch
>
>
> We are running solr with sunspot and when we set up a query containing a single plus, Solr blows up with the following error:
> SOLR Request (5.0ms)  [ path=#<RSolr::Client:0x4c7464ac> parameters={data: fq=type%3A%28Attachment+OR+User+OR+GpdbDataSource+OR+HadoopInstance+OR+GnipInstance+OR+Workspace+OR+Workfile+OR+Tag+OR+Dataset+OR+HdfsEntry%29&fq=type_name_s%3A%28Attachment+OR+User+OR+Instance+OR+Workspace+OR+Workfile+OR+Tag+OR+Dataset+OR+HdfsEntry%29&fq=-%28security_type_name_sm%3A%28Dataset%29+AND+-instance_account_ids_im%3A%282+OR+1%29%29&fq=-%28security_type_name_sm%3AChorusView+AND+-member_ids_im%3A1+AND+-public_b%3Atrue%29&fq=-%28security_type_name_sm%3A%28Dataset%29+AND+-instance_account_ids_im%3A%282+OR+1%29%29&fq=-%28security_type_name_sm%3AChorusView+AND+-member_ids_im%3A1+AND+-public_b%3Atrue%29&q=%2B&fl=%2A+score&qf=name_texts+first_name_texts+last_name_texts+file_name_texts&defType=edismax&hl=on&hl.simple.pre=%40%40%40hl%40%40%40&hl.simple.post=%40%40%40endhl%40%40%40&start=0&rows=3, method: post, params: {:wt=>:ruby}, query: wt=ruby, headers: {"Content-Type"=>"application/x-www-form-urlencoded; charset=UTF-8"}, path: select, uri: http://localhost:8982/solr/select?wt=ruby, open_timeout: , read_timeout: } ]
> RSolr::Error::Http (RSolr::Error::Http - 400 Bad Request
> Error:     org.apache.lucene.queryParser.ParseException: Cannot parse '': Encountered "<EOF>" at line 1, column 0.
> Was expecting one of:
>     <NOT> ...
>     "+" ...
>     "-" ...
>     "(" ...
>     "*" ...
>     <QUOTED> ...
>     <TERM> ...
>     <PREFIXTERM> ...
>     <WILDTERM> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org