You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by bu...@apache.org on 2004/03/06 09:36:05 UTC

DO NOT REPLY [Bug 27491] New: - [PATCH] Allowing '-'/'+' in terms

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=27491>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=27491

[PATCH] Allowing '-'/'+' in terms

           Summary: [PATCH] Allowing '-'/'+' in terms
           Product: Lucene
           Version: unspecified
          Platform: Other
        OS/Version: Other
            Status: NEW
          Severity: Normal
          Priority: Other
         Component: QueryParser
        AssignedTo: lucene-dev@jakarta.apache.org
        ReportedBy: morus.walter@gmx.de


I suggest to change the definition of term character in QueryParser.jj
from 
| <#_TERM_CHAR: ( <_TERM_START_CHAR> | <_ESCAPED_CHAR> ) >
to
| <#_TERM_CHAR: ( <_TERM_START_CHAR> | <_ESCAPED_CHAR> | "-" | "+" ) >

As a result query parser will read '-' and '+' within words (such as tft-monitor
or Sysh1-1) as one term, which will be tokenized by the used analyzer
and end up in a term query or phrase query depending if it create one ore
more tokens.
So with StandardAnalyzer a query tft-monitor would get a phrase query "tft
monitor" and Sysh1-1 a term query for "Sysh1-1". 
Searching tft-monitor as a phrase "tft monitor" is not exact but the best
aproximation possible once you indexed tft-monitor as tokens tft and monitor.
Currently query parser interpret every '-' or '+' as operators, which means
that 'tft-monitor' gets parsed as tft AND NOT monitor, which probably isn't what
 the user wanted.
The effect of '-'/'+' not occuring within a word is not changed, so
tft -monitor will still search for 'tft AND NOT monitor'.

All regression tests pass with the change.

I didn't add a patch-file, because I think it's easy to change queryParser.jj by
hand.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org