You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mark Miller (JIRA)" <ji...@apache.org> on 2009/07/22 03:44:15 UTC

[jira] Issue Comment Edited: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733946#action_12733946 ] 

Mark Miller edited comment on LUCENE-1486 at 7/21/09 6:43 PM:
--------------------------------------------------------------

I originally thought it might live in contrib as well (see above), but I'm personally fine with it being in core.

bq.  It seems like a lot of queries you could enter here are not really supported and might throw strange exceptions.

A lot of queries? I think Adriano is just having trouble with phrases inside phrases, which is unsupported. Other things that are not supported might throw exceptions too, but I think thats to be expected? I see what Adriano was talking about now - technically the first 2 quotes would match, and then the second two - I think Mark H was just demonstrating that you shouldn't try that query though - a user might think they are quoting smith, but for the example, it doesn't matter. I think he just trying to show that you shouldn't try and "nest" phrases - even though they wouldn't be interpreted that way anyway.

It only supports a limited subset of the Lucene query language - perhaps we could improve the exceptions being thrown, but the exceptions the queryparser throws often leave just as much to be desired. I don't think its experimental because of that.

Personally, I think the class does what it intends - allows a limited subset of the Lucene query language in phrases. Though of course it could be improved.

I'll let Mark H respond though. I also don't mind seeing it moved to contrib, but I'm not sure anything glaring points to it being moved at the moment. It lives up to its limited contract I think.

      was (Author: markrmiller@gmail.com):
    I originally thought it might live in contrib as well (see above), but I'm personally fine with it being in core.

bq.  It seems like a lot of queries you could enter here are not really supported and might throw strange exceptions.

A lot of queries? I think Adriano is just having trouble with phrases inside phrases, which is unsupported. Other things that are not supported might throw exceptions too, but I think thats to be expected? I see what Adriano was talking about now - technically the first 2 quotes would match, and then the second two - I think Mark H was just demonstrating that you shouldn't try query though - a user might think they are quoting smith, but for the example, it doesn't matter. I think he just trying to show that you shouldn't try and "nest" phrases - even though they wouldn't be interpreted that way anyway.

It only supports a limited subset of the Lucene query language - perhaps we could improve the exceptions being thrown, but the exceptions the queryparser throws often leave just as much to be desired. I don't think its experimental because of that.

Personally, I think the class does what it intends - allows a limited subset of the Lucene query language in phrases. Though of course it could be improved.

I'll let Mark H respond though. I also don't mind seeing it moved to contrib, but I'm not sure anything glaring points to it being moved at the moment. It lives up to its limited contract I think.
  
> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> 		checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> 		checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> 		checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
> 		
> 		checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> 		checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> 		checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org