You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2013/10/30 17:11:33 UTC
[jira] [Updated] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

     [ https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Allison updated LUCENE-5205:
--------------------------------

    Description: 
This parser includes functionality from:

* Classic QueryParser: most of its syntax
* SurroundQueryParser: recursive parsing for "near" and "not" clauses.
* ComplexPhraseQueryParser: can handle "near" queries that include multiterms (wildcard, fuzzy, regex, prefix),
* AnalyzingQueryParser: has an option to analyze multiterms.


Same as classic syntax:
* term: test 
* fuzzy: roam~0.8, roam~2
* wildcard: te?t, test*, t*st
* regex: /\[mb\]oat/
* phrase: "jakarta apache"
* phrase with slop: "jakarta apache"~3
* default "or" clause: jakarta apache
* grouping "or" clause: (jakarta apache)
 
Main additions in SpanQueryParser syntax vs. classic syntax:
* Can require "in order" for phrases with slop with the \~> operator: "jakarta apache"\~>3
* Can specify "not near": "fever bieber"!\~3,10 ::
    find "fever" but not if "bieber" appears within 3 words before or 10 words after it.
* Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta apache\]~3 lucene\]\~>4 :: 
    find "jakarta" within 3 words of "apache", and that hit has to be within four words before "lucene"
* Can also use \[\] for single level phrasal queries instead of " as in: \[jakarta apache\]
* Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 :: find "apache" and then either "lucene" or "solr" within three words.
* Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
* Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two words of "ap*che" and that hit has to be within ten words of something like "solr" or that "lucene" regex.

In combination with a QueryFilter, has been very useful for concordance tasks and for analytical search.  SpanQueries, of course, can also be used as a Query for regular search via IndexSearcher.

Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.

Most of the documentation is in the javadoc for SpanQueryParser.

I'm happy to throw this in the Sandbox, if desired.

Any and all feedback is welcome.  Thank you.

  was:
This parser includes functionality from:

*Classic QueryParser: most of its syntax
*SurroundQueryParser: recursive parsing for "near" and "not" clauses.
*ComplexPhraseQueryParser: can handle "near" queries that include multiterms (wildcard, fuzzy, regex, prefix),
*AnalyzingQueryParser: has an option to analyze multiterms.

In combination with a QueryFilter, has been very useful for concordance tasks and for analytical search.  SpanQueries, of course, can also be used as a Query for regular search via IndexSearcher.

Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.

Most of the documentation is in the javadoc for SpanQueryParser.

I'm happy to throw this in the Sandbox, if desired.

Any and all feedback is welcome.  Thank you.


> [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-5205
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5205
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/queryparser
>            Reporter: Tim Allison
>              Labels: patch
>             Fix For: 4.6
>
>         Attachments: SpanQueryParser_v1.patch.gz
>
>
> This parser includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
>     find "fever" but not if "bieber" appears within 3 words before or 10 words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta apache\]~3 lucene\]\~>4 :: 
>     find "jakarta" within 3 words of "apache", and that hit has to be within four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two words of "ap*che" and that hit has to be within ten words of something like "solr" or that "lucene" regex.
> In combination with a QueryFilter, has been very useful for concordance tasks and for analytical search.  SpanQueries, of course, can also be used as a Query for regular search via IndexSearcher.
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> I'm happy to throw this in the Sandbox, if desired.
> Any and all feedback is welcome.  Thank you.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org