You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2013/11/09 12:47:18 UTC

[jira] [Commented] (LUCENE-5336) Add a simple QueryParser to parse human-entered queries.

    [ https://issues.apache.org/jira/browse/LUCENE-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818114#comment-13818114 ] 

Michael McCandless commented on LUCENE-5336:
--------------------------------------------

This is AWESOME.  I love how the operators (even whitespace!) are
optional. And I love the name :)  And it's great that it NEVER throws
an exc no matter how awful the input is.  And I love that it does not
use a lexer/parser generator: this makes it much more approachable
to those devs that don't have experience with parser generators.

Small javadoc fix: instead of "any {@code -} characters beyond the
first character in a term may not need to be escaped," I think it
should say "any {@code -} characters beyond the first character do not
need to be escaped" (and same for * operator)"?

How does it handle mal-formed input, e.g. a missing closing " for a
phrase query?  If I enter "foo bar will it just make a term query for
"foo and a term query for bar?  Or, does it strip that " and do query
foo instead?  (Same for missing closing paren?).  It looks like it
drops the " and ( and does a simple term query (good).

Maybe you could add fangs to the random test by more frequently mixing
in these operator characters ...


> Add a simple QueryParser to parse human-entered queries.
> --------------------------------------------------------
>
>                 Key: LUCENE-5336
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5336
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Jack Conradson
>         Attachments: LUCENE-5336.patch
>
>
> I would like to add a new simple QueryParser to Lucene that is designed to parse human-entered queries.  This parser will operate on an entire entered query using a specified single field or a set of weighted fields (using term boost).
> All features/operations in this parser can be enabled or disabled depending on what is necessary for the user.  A default operator may be specified as either 'MUST' representing 'and' or 'SHOULD' representing 'or.'  The features/operations that this parser will include are the following:
> * AND specified as '+'
> * OR specified as '|'
> * NOT specified as '-'
> * PHRASE surrounded by double quotes
> * PREFIX specified as '*'
> * PRECEDENCE surrounded by '(' and ')'
> * WHITESPACE specified as ' ' '\n' '\r' and '\t' will cause the default operator to be used
> * ESCAPE specified as '\' will allow operators to be used in terms
> The key differences between this parser and other existing parsers will be the following:
> * No exceptions will be thrown, and errors in syntax will be ignored.  The parser will do a best-effort interpretation of any query entered.
> * It uses minimal syntax to express queries.  All available operators are single characters or pairs of single characters.
> * The parser is hand-written and in a single Java file making it easy to modify.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org