You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by jmr <jm...@free.fr> on 2010/10/21 11:53:39 UTC

A bug in ComplexPhraseQuery ?

Hi,

We have installed ComplexPhraseQuery and since that we can see strange
behaviour in proximity search.

We have the 2 following queries:
(text:("protein digest"~50))
(text:("digest protein"~50))

Without ComplexPhraseQuery, both queries are returning 6 documents matching.
With ComplexPhraseQuery, query 1 returns 4 documents and query 2 returns 5
documents!

It seems that proximity search is broken. Is this a known problem ?

Thanks for your help.

Regards,
J-Michel
-- 
View this message in context: http://lucene.472066.n3.nabble.com/A-bug-in-ComplexPhraseQuery-tp1744659p1744659.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: A bug in ComplexPhraseQuery ?

Posted by jmr <jm...@free.fr>.

iorixxx wrote:
> 
>> <queryParser name="complexphrase"
>> class="org.apache.solr.search.ComplexPhraseQParserPlugin">
>>     <bool
>> name="inOrder">false</bool>
>>   </queryParser>
>> 
> 
> I added this change to SOLR-1604, can you test it give us feedback?
> 
> 

May thanks. I'll test this quite soon and let you know.
J-Michel
-- 
View this message in context: http://lucene.472066.n3.nabble.com/A-bug-in-ComplexPhraseQuery-tp1744659p1757145.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: A bug in ComplexPhraseQuery ?

Posted by jmr <jm...@free.fr>.

iorixxx wrote:
> 
> 
> I added Terje Eggestad's fix[1], can you test it give us feedback?
> 
> 

Hi,

Sorry for the delay. The fix was working well but we discovered another
query crashing the parser:
a63b27/00:IC
"org.apache.lucene.search.PhraseQuery" found in phrase query string
"a63b27/00"
 at
org.apache.lucene.queryParser.ComplexPhraseQueryParser$ComplexPhraseQuery.rewrite(ComplexPhraseQueryParser.java:256)
 at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:311)
 at org.apache.lucene.search.Query.weight(Query.java:98)
 at org.apache.lucene.search.Searcher.createWeight(Searcher.java:230)
 at org.apache.lucene.search.Searcher.search(Searcher.java:171)
 at
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988)
The problem is with the /

We did a fix that seems to work here is the diff:

--- ComplexPhraseQueryParser.java.org	2010-11-04 02:56:04.000000000 +0100
+++ ComplexPhraseQueryParser.java	2010-11-05 10:14:08.062500000 +0100
@@ -245,7 +245,7 @@
 
     public Query rewrite(IndexReader reader) throws IOException {
       // ArrayList spanClauses = new ArrayList();
-      if (contents instanceof TermQuery) {
+      if (contents instanceof TermQuery || contents instanceof PhraseQuery)
{
         return contents;
       }
       // Build a sequence of Span clauses arranged in a SpanNear - child

Jean-Michel
-- 
View this message in context: http://lucene.472066.n3.nabble.com/A-bug-in-ComplexPhraseQuery-tp1744659p2057933.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: A bug in ComplexPhraseQuery ?

Posted by Ahmet Arslan <io...@yahoo.com>.
> However, we have found that this query is crashing when
> using
> CoomplexPhraseQuery:
> "sulfur-reducing bacteria"
> 
> It is due to the dash inside the phrase.
> Here is the trace:
> java.lang.IllegalArgumentException: Unknown query type
> "org.apache.lucene.search.PhraseQuery" found in phrase
> query string
> "sulfur-reducing bacteria"

I added Terje Eggestad's fix[1], can you test it give us feedback?

[1]https://issues.apache.org/jira/browse/LUCENE-1486?focusedCommentId=12900278&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12900278


      

Re: A bug in ComplexPhraseQuery ?

Posted by jmr <jm...@free.fr>.

iorixxx wrote:
> 
> 
> I added this change to SOLR-1604, can you test it give us feedback?
> 
> 

Hi,

Sorry for the delay.
We have tested the change and it is OK for this.

However, we have found that this query is crashing when using
CoomplexPhraseQuery:
"sulfur-reducing bacteria"

It is due to the dash inside the phrase.
Here is the trace:
java.lang.IllegalArgumentException: Unknown query type
"org.apache.lucene.search.PhraseQuery" found in phrase query string
"sulfur-reducing bacteria"
 at
org.apache.lucene.queryParser.ComplexPhraseQueryParser$ComplexPhraseQuery.rewrite(ComplexPhraseQueryParser.java:290)
 at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:438)
 at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:311)
 at org.apache.lucene.search.Query.weight(Query.java:98)
 at org.apache.lucene.search.Searcher.createWeight(Searcher.java:230)
 at org.apache.lucene.search.Searcher.search(Searcher.java:171)
 at
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988)
...

Regards
Jean-Michel

-- 
View this message in context: http://lucene.472066.n3.nabble.com/A-bug-in-ComplexPhraseQuery-tp1744659p1835918.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: A bug in ComplexPhraseQuery ?

Posted by Ahmet Arslan <io...@yahoo.com>.
> <queryParser name="complexphrase"
> class="org.apache.solr.search.ComplexPhraseQParserPlugin">
>     <bool
> name="inOrder">false</bool>
>   </queryParser>
> 

I added this change to SOLR-1604, can you test it give us feedback?


      

Re: A bug in ComplexPhraseQuery ?

Posted by Ahmet Arslan <io...@yahoo.com>.
> In my opinion, ordering term in a proximity search does not
> make sense!
> So the work around for us is to generate the opposite
> search every time a
> proximity operator is used.
> not very elegant!

If you want I can make it configurable. You can define your choice in solrconfig.xml like this:

<queryParser name="complexphrase" class="org.apache.solr.search.ComplexPhraseQParserPlugin">
    <bool name="inOrder">false</bool>
  </queryParser>


      

Re: A bug in ComplexPhraseQuery ?

Posted by jmr <jm...@free.fr>.

iorixxx wrote:
> 
> ComplexPhraseQuery is ordered phrase query where default Lucene's
> PhraseQuery is unordered. With ComplexPhrase order or terms are important.
> 

Thanks for your answer.
With this request: (text:("protein digest"~50)) || (text:("digest
protein"~50))
I get my 6 documents.

In my opinion, ordering term in a proximity search does not make sense!
So the work around for us is to generate the opposite search every time a
proximity operator is used.
not very elegant!

Anyway, thaks again for the answer,
J-Michel
-- 
View this message in context: http://lucene.472066.n3.nabble.com/A-bug-in-ComplexPhraseQuery-tp1744659p1750748.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: A bug in ComplexPhraseQuery ?

Posted by Ahmet Arslan <io...@yahoo.com>.

--- On Thu, 10/21/10, jmr <jm...@free.fr> wrote:

> From: jmr <jm...@free.fr>
> Subject: A bug in ComplexPhraseQuery ?
> To: solr-user@lucene.apache.org
> Date: Thursday, October 21, 2010, 12:53 PM
> 
> Hi,
> 
> We have installed ComplexPhraseQuery and since that we can
> see strange
> behaviour in proximity search.
> 
> We have the 2 following queries:
> (text:("protein digest"~50))
> (text:("digest protein"~50))
> 
> Without ComplexPhraseQuery, both queries are returning 6
> documents matching.
> With ComplexPhraseQuery, query 1 returns 4 documents and
> query 2 returns 5
> documents!
> 
> It seems that proximity search is broken. Is this a known
> problem ?

ComplexPhraseQuery is ordered phrase query where default Lucene's PhraseQuery is unordered. With ComplexPhrase order or terms are important.