You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Walter Underwood (JIRA)" <ji...@apache.org> on 2007/02/16 01:19:05 UTC

[jira] Created: (SOLR-161) Dangling dash causes stack trace

Dangling dash causes stack trace
--------------------------------

                 Key: SOLR-161
                 URL: https://issues.apache.org/jira/browse/SOLR-161
             Project: Solr
          Issue Type: Bug
          Components: search
    Affects Versions: 1.1.0
         Environment: Java 1.5, Tomcat 5.5.17, Fedora Core 4, Intel
            Reporter: Walter Underwood


I'm running tests from our search logs, and we have a query that ends in a dash. That caused a stack trace.

org.apache.lucene.queryParser.ParseException: Cannot parse 'digging for the truth -': Encountered "<EOF>" at line 1, column 23.
Was expecting one of:
    "(" ...
    <QUOTED> ...
    <TERM> ...
    <PREFIXTERM> ...
    <WILDTERM> ...
    "[" ...
    "{" ...
    <NUMBER> ...
    
	at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:127)
	at org.apache.solr.request.DisMaxRequestHandler.handleRequest(DisMaxRequestHandler.java:272)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:595)
	at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:92)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-161) Dangling dash causes stack trace

Posted by "Mike Klaas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602038#action_12602038 ] 

Mike Klaas commented on SOLR-161:
---------------------------------

> It is really a Lucene query parser bug, but it wouldn't hurt to do s/(.*)-/&/ as a workaround. Assuming my ed(1) syntax is still > >fresh. Regardless, no query string should ever give a stack trace

This might be hard to guarantee.  Already there are four issues details specific ways that dismax that barf on input.  A lot of the suggestions above are of the form of detecting a specific failure mode and correcting it, which does not guarantee that you will catch them all.

A robust way to do it is parse the query into an AST using a grammar in a way that matches the query as well as possible (dropping the stuff that doesn't fit).  Unfortunately, this is duplicative of the lucene parsing logic, and it would be nicer add a "relaxed" mode to lucene rather than pre-parsing the query.

(The reparse+reassemble method is what we use, btw.  It is written in python but it might be possible to translate to java.)

> Dangling dash causes stack trace
> --------------------------------
>
>                 Key: SOLR-161
>                 URL: https://issues.apache.org/jira/browse/SOLR-161
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.1.0
>         Environment: Java 1.5, Tomcat 5.5.17, Fedora Core 4, Intel
>            Reporter: Walter Underwood
>
> I'm running tests from our search logs, and we have a query that ends in a dash. That caused a stack trace.
> org.apache.lucene.queryParser.ParseException: Cannot parse 'digging for the truth -': Encountered "<EOF>" at line 1, column 23.
> Was expecting one of:
>     "(" ...
>     <QUOTED> ...
>     <TERM> ...
>     <PREFIXTERM> ...
>     <WILDTERM> ...
>     "[" ...
>     "{" ...
>     <NUMBER> ...
>     
> 	at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:127)
> 	at org.apache.solr.request.DisMaxRequestHandler.handleRequest(DisMaxRequestHandler.java:272)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:595)
> 	at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:92)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-161) Dangling dash causes stack trace

Posted by "Walter Underwood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473625 ] 

Walter Underwood commented on SOLR-161:
---------------------------------------

The parser can have a rule for this rather than exploding. A trailing dash is never meaningful and can be omitted, whether we're allowing +/- or not. Seems like a grammar bug to me. --wunder

> Dangling dash causes stack trace
> --------------------------------
>
>                 Key: SOLR-161
>                 URL: https://issues.apache.org/jira/browse/SOLR-161
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.1.0
>         Environment: Java 1.5, Tomcat 5.5.17, Fedora Core 4, Intel
>            Reporter: Walter Underwood
>
> I'm running tests from our search logs, and we have a query that ends in a dash. That caused a stack trace.
> org.apache.lucene.queryParser.ParseException: Cannot parse 'digging for the truth -': Encountered "<EOF>" at line 1, column 23.
> Was expecting one of:
>     "(" ...
>     <QUOTED> ...
>     <TERM> ...
>     <PREFIXTERM> ...
>     <WILDTERM> ...
>     "[" ...
>     "{" ...
>     <NUMBER> ...
>     
> 	at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:127)
> 	at org.apache.solr.request.DisMaxRequestHandler.handleRequest(DisMaxRequestHandler.java:272)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:595)
> 	at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:92)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SOLR-161) Dangling dash causes stack trace

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic resolved SOLR-161.
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 1.3
         Assignee: Otis Gospodnetic

It looks like SOLR-589 solves the problem Walter described here a year and a half ago.


> Dangling dash causes stack trace
> --------------------------------
>
>                 Key: SOLR-161
>                 URL: https://issues.apache.org/jira/browse/SOLR-161
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.1.0
>         Environment: Java 1.5, Tomcat 5.5.17, Fedora Core 4, Intel
>            Reporter: Walter Underwood
>            Assignee: Otis Gospodnetic
>             Fix For: 1.3
>
>
> I'm running tests from our search logs, and we have a query that ends in a dash. That caused a stack trace.
> org.apache.lucene.queryParser.ParseException: Cannot parse 'digging for the truth -': Encountered "<EOF>" at line 1, column 23.
> Was expecting one of:
>     "(" ...
>     <QUOTED> ...
>     <TERM> ...
>     <PREFIXTERM> ...
>     <WILDTERM> ...
>     "[" ...
>     "{" ...
>     <NUMBER> ...
>     
> 	at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:127)
> 	at org.apache.solr.request.DisMaxRequestHandler.handleRequest(DisMaxRequestHandler.java:272)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:595)
> 	at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:92)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-161) Dangling dash causes stack trace

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473580 ] 

Hoss Man commented on SOLR-161:
-------------------------------

Hmm... yeah, that's the trade off of DisMax respecting + and - as special characters and not escaping them.

I guess we should add some preprocessing rules to deal with this ... i wonder if maybe we should just allow a regex to be specified in the init params for letting users strip arbitrary patterns.

Hmm... should a regex like that be applied before or after the call to partialEscape?

> Dangling dash causes stack trace
> --------------------------------
>
>                 Key: SOLR-161
>                 URL: https://issues.apache.org/jira/browse/SOLR-161
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.1.0
>         Environment: Java 1.5, Tomcat 5.5.17, Fedora Core 4, Intel
>            Reporter: Walter Underwood
>
> I'm running tests from our search logs, and we have a query that ends in a dash. That caused a stack trace.
> org.apache.lucene.queryParser.ParseException: Cannot parse 'digging for the truth -': Encountered "<EOF>" at line 1, column 23.
> Was expecting one of:
>     "(" ...
>     <QUOTED> ...
>     <TERM> ...
>     <PREFIXTERM> ...
>     <WILDTERM> ...
>     "[" ...
>     "{" ...
>     <NUMBER> ...
>     
> 	at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:127)
> 	at org.apache.solr.request.DisMaxRequestHandler.handleRequest(DisMaxRequestHandler.java:272)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:595)
> 	at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:92)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-161) Dangling dash causes stack trace

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473622 ] 

Yonik Seeley commented on SOLR-161:
-----------------------------------

Strikes me as more of an implementation detail... most people aren't going to think about configuring regex rules for the dismax handler until something breaks.  Setting up regex rules that will make things better and not worse sound hard too.  Should this really be externally configurable?  Perhaps some examples would make it clearer for me.

> Dangling dash causes stack trace
> --------------------------------
>
>                 Key: SOLR-161
>                 URL: https://issues.apache.org/jira/browse/SOLR-161
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.1.0
>         Environment: Java 1.5, Tomcat 5.5.17, Fedora Core 4, Intel
>            Reporter: Walter Underwood
>
> I'm running tests from our search logs, and we have a query that ends in a dash. That caused a stack trace.
> org.apache.lucene.queryParser.ParseException: Cannot parse 'digging for the truth -': Encountered "<EOF>" at line 1, column 23.
> Was expecting one of:
>     "(" ...
>     <QUOTED> ...
>     <TERM> ...
>     <PREFIXTERM> ...
>     <WILDTERM> ...
>     "[" ...
>     "{" ...
>     <NUMBER> ...
>     
> 	at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:127)
> 	at org.apache.solr.request.DisMaxRequestHandler.handleRequest(DisMaxRequestHandler.java:272)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:595)
> 	at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:92)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-161) Dangling dash causes stack trace

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473603 ] 

Yonik Seeley commented on SOLR-161:
-----------------------------------

"do what i mean" isn't always easy.

If you want to allow people to express + and - like many search engines, perhaps only treat +/- as special if proceeded by whitespace (or if they are the first character), and followed by non whitespace?
The only exception to that off the top of my head would be a negative number.

> Dangling dash causes stack trace
> --------------------------------
>
>                 Key: SOLR-161
>                 URL: https://issues.apache.org/jira/browse/SOLR-161
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.1.0
>         Environment: Java 1.5, Tomcat 5.5.17, Fedora Core 4, Intel
>            Reporter: Walter Underwood
>
> I'm running tests from our search logs, and we have a query that ends in a dash. That caused a stack trace.
> org.apache.lucene.queryParser.ParseException: Cannot parse 'digging for the truth -': Encountered "<EOF>" at line 1, column 23.
> Was expecting one of:
>     "(" ...
>     <QUOTED> ...
>     <TERM> ...
>     <PREFIXTERM> ...
>     <WILDTERM> ...
>     "[" ...
>     "{" ...
>     <NUMBER> ...
>     
> 	at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:127)
> 	at org.apache.solr.request.DisMaxRequestHandler.handleRequest(DisMaxRequestHandler.java:272)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:595)
> 	at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:92)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-161) Dangling dash causes stack trace

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473610 ] 

Hoss Man commented on SOLR-161:
-------------------------------

i'm thinking the default should be to always assume they are special, but allow for a regex based rules that will preprocess the input and can handle the types of situations you describe.

(i was thinking before a stripping regex -- but an ordered list of replacement regexes would be more flexible).

> Dangling dash causes stack trace
> --------------------------------
>
>                 Key: SOLR-161
>                 URL: https://issues.apache.org/jira/browse/SOLR-161
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.1.0
>         Environment: Java 1.5, Tomcat 5.5.17, Fedora Core 4, Intel
>            Reporter: Walter Underwood
>
> I'm running tests from our search logs, and we have a query that ends in a dash. That caused a stack trace.
> org.apache.lucene.queryParser.ParseException: Cannot parse 'digging for the truth -': Encountered "<EOF>" at line 1, column 23.
> Was expecting one of:
>     "(" ...
>     <QUOTED> ...
>     <TERM> ...
>     <PREFIXTERM> ...
>     <WILDTERM> ...
>     "[" ...
>     "{" ...
>     <NUMBER> ...
>     
> 	at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:127)
> 	at org.apache.solr.request.DisMaxRequestHandler.handleRequest(DisMaxRequestHandler.java:272)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:595)
> 	at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:92)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-161) Dangling dash causes stack trace

Posted by "Walter Underwood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473628 ] 

Walter Underwood commented on SOLR-161:
---------------------------------------

It is really a Lucene query parser bug, but it wouldn't hurt to do s/(.*)-/&/ as a workaround. Assuming my ed(1) syntax is still fresh. Regardless, no query string should ever give a stack trace. --wunder

> Dangling dash causes stack trace
> --------------------------------
>
>                 Key: SOLR-161
>                 URL: https://issues.apache.org/jira/browse/SOLR-161
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.1.0
>         Environment: Java 1.5, Tomcat 5.5.17, Fedora Core 4, Intel
>            Reporter: Walter Underwood
>
> I'm running tests from our search logs, and we have a query that ends in a dash. That caused a stack trace.
> org.apache.lucene.queryParser.ParseException: Cannot parse 'digging for the truth -': Encountered "<EOF>" at line 1, column 23.
> Was expecting one of:
>     "(" ...
>     <QUOTED> ...
>     <TERM> ...
>     <PREFIXTERM> ...
>     <WILDTERM> ...
>     "[" ...
>     "{" ...
>     <NUMBER> ...
>     
> 	at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:127)
> 	at org.apache.solr.request.DisMaxRequestHandler.handleRequest(DisMaxRequestHandler.java:272)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:595)
> 	at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:92)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-161) Dangling dash causes stack trace

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473627 ] 

Hoss Man commented on SOLR-161:
-------------------------------

A trailing dash is never meaningful, but should it be striped, escaped, or treated as an error (assuming the error message is improved)

As for examples of what i'm thinking:
  * a regex that escapes '-' if it appears before a sequence of digits
     (negative number? or prohibited number?)
  * a regex that escapes '"' if it apears at the end of a sequence of digits, and no 
     other instance of '"' proceeds it in the string.
     (inches? or unterminated phrase query?)
  * a regex that strips (or escapes) '-' or '+' characters that are ajacent only to whitespace
     or the start/end tokens of the string
     (literals? garbage to be ignored? or malformed mandatory/prohibited modifiers?)

...basically any of the types of preprocessing i ever considered hardcoding into dismax, but then decided not to because i was afraid someone would say "that's not what i want, and there's no way to turn it off"


> Dangling dash causes stack trace
> --------------------------------
>
>                 Key: SOLR-161
>                 URL: https://issues.apache.org/jira/browse/SOLR-161
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.1.0
>         Environment: Java 1.5, Tomcat 5.5.17, Fedora Core 4, Intel
>            Reporter: Walter Underwood
>
> I'm running tests from our search logs, and we have a query that ends in a dash. That caused a stack trace.
> org.apache.lucene.queryParser.ParseException: Cannot parse 'digging for the truth -': Encountered "<EOF>" at line 1, column 23.
> Was expecting one of:
>     "(" ...
>     <QUOTED> ...
>     <TERM> ...
>     <PREFIXTERM> ...
>     <WILDTERM> ...
>     "[" ...
>     "{" ...
>     <NUMBER> ...
>     
> 	at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:127)
> 	at org.apache.solr.request.DisMaxRequestHandler.handleRequest(DisMaxRequestHandler.java:272)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:595)
> 	at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:92)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.