You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by javier muguruza <jm...@gmail.com> on 2005/12/20 18:42:03 UTC

MatchAllDocsQuery with BooleanClause.Occur.MUST_NOT

Hi,

If I run a query like this:
-(-body:angel) -(-body:darpa)
I get 0 hits. As I did not find any thread about that case, I though
ANDing with a MatchAllDocsQuery would return my desired set (all docs
excepting the ones with angel or darpa).

So more I do the following:
BooleanQuery mbq = new BooleanQuery();
BooleanQuery mydocs = -body:darpa query;
MatchAllDocsQuery alldocs = new MatchAllDocsQuery();
mbq.add(mydocs , BooleanClause.Occur.MUST_NOT);
mbq.add(alld, BooleanClause.Occur.MUST);

and I still get 0 hits, if I only add the alldocs query I get some
hits, does anyone see something wrong with my approach?

javier

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: MatchAllDocsQuery with BooleanClause.Occur.MUST_NOT

Posted by javier muguruza <jm...@gmail.com>.

The double negatives are a byproduct of my code building the query, as
the user has several fields to choose from etc, it is not done on
purpose.

With luke I was able to see that:
+body:memo 1 hit
+body:memo -body:policy 0
+body:memo -(-body:policy) 1
+body:memo +(-body:policy) 0
In the last two cases, if I substitute policy by a non-existant word
(policy is in the index) I get the same results, so the extra -() is
screwing the query. For my previous tests while coding I noticed
(wrongly) the extra -() was harmless...my bad.

I have refactored my code so extra -() are avoided, now everything is
running ok,

thanks,
javier

On 12/20/05, Yonik Seeley <ys...@gmail.com> wrote:
> I'm not sure I understand the purpose of the double negatives.
> Also, a boolean query must have some non-prohibited clauses to match
> anything (and mydocs in your code example violates this).
>
> Is the following what you are trying to do?
>
> BooleanQuery mbq = new BooleanQuery();
> TermQuery mydocs = body:darpa query;
> MatchAllDocsQuery alldocs = new MatchAllDocsQuery();
> mbq.add(mydocs , BooleanClause.Occur.MUST_NOT);
> mbq.add(alld, BooleanClause.Occur.MUST);
>
> -Yonik
>
>
> On 12/20/05, javier muguruza <jm...@gmail.com> wrote:
> > Hi,
> >
> > If I run a query like this:
> > -(-body:angel) -(-body:darpa)
> > I get 0 hits. As I did not find any thread about that case, I though
> > ANDing with a MatchAllDocsQuery would return my desired set (all docs
> > excepting the ones with angel or darpa).
> >
> > So more I do the following:
> > BooleanQuery mbq = new BooleanQuery();
> > BooleanQuery mydocs = -body:darpa query;
> > MatchAllDocsQuery alldocs = new MatchAllDocsQuery();
> > mbq.add(mydocs , BooleanClause.Occur.MUST_NOT);
> > mbq.add(alld, BooleanClause.Occur.MUST);
> >
> > and I still get 0 hits, if I only add the alldocs query I get some
> > hits, does anyone see something wrong with my approach?
> >
> > javier
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: MatchAllDocsQuery with BooleanClause.Occur.MUST_NOT

Posted by Yonik Seeley <ys...@gmail.com>.

I'm not sure I understand the purpose of the double negatives.
Also, a boolean query must have some non-prohibited clauses to match
anything (and mydocs in your code example violates this).

Is the following what you are trying to do?

BooleanQuery mbq = new BooleanQuery();
TermQuery mydocs = body:darpa query;
MatchAllDocsQuery alldocs = new MatchAllDocsQuery();
mbq.add(mydocs , BooleanClause.Occur.MUST_NOT);
mbq.add(alld, BooleanClause.Occur.MUST);

-Yonik


On 12/20/05, javier muguruza <jm...@gmail.com> wrote:
> Hi,
>
> If I run a query like this:
> -(-body:angel) -(-body:darpa)
> I get 0 hits. As I did not find any thread about that case, I though
> ANDing with a MatchAllDocsQuery would return my desired set (all docs
> excepting the ones with angel or darpa).
>
> So more I do the following:
> BooleanQuery mbq = new BooleanQuery();
> BooleanQuery mydocs = -body:darpa query;
> MatchAllDocsQuery alldocs = new MatchAllDocsQuery();
> mbq.add(mydocs , BooleanClause.Occur.MUST_NOT);
> mbq.add(alld, BooleanClause.Occur.MUST);
>
> and I still get 0 hits, if I only add the alldocs query I get some
> hits, does anyone see something wrong with my approach?
>
> javier

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: ParseQuery with quotes

Posted by Yonik Seeley <ys...@gmail.com>.

On 12/20/05, John Powers <jp...@configureone.com> wrote:
> Ok, I understand the .toString() part.
>
> But, if I have some 19" in the text of these items, and I do a search with
> 19", that has been escaped before parsing....why am I not getting anything?
> The indexer analyzer took them out?   So then to find these documents, I
> would want to either change the analyzer or take the quote out of the query?
> Correct?

Right.  Most likely the index analyzer removed the quote.  You can
verify with something like Luke.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: ParseQuery with quotes

Posted by Yonik Seeley <ys...@gmail.com>.

Here's more on query-parser escaping gotchas:

http://www.mail-archive.com/java-user@lucene.apache.org/msg02354.html

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: ParseQuery with quotes

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

As always, the best things to do are to know exactly what you indexed  
(AnalyzerDemo from Lucene in Action helps here, and of course the  
wonderful tool, Luke).  Then try simplifying your query to a  
TermQuery for a term you know or suspect you indexed.  QueryParser  
introduces a lot of variables to the equation, such as operators and  
special characters that get interpreted by the parser itself,  
escaping rules, and analysis - not to mention the Query.toString()  
issue you've discovered.

	Erik


On Dec 20, 2005, at 4:14 PM, John Powers wrote:

> Ok, I understand the .toString() part.
>
> But, if I have some 19" in the text of these items, and I do a  
> search with
> 19", that has been escaped before parsing....why am I not getting  
> anything?
> The indexer analyzer took them out?   So then to find these  
> documents, I
> would want to either change the analyzer or take the quote out of  
> the query?
> Correct?
>
>
> On 12/20/05 12:07 PM, "Yonik Seeley" <ys...@gmail.com> wrote:
>
>> On 12/20/05, John Powers <jp...@configureone.com> wrote:
>>> I would like to be able to search for 19 inches with the quote.   
>>> So I get a
>>> query like this:
>>> Line 1:  +( (name:19"*^4 ld:19"*^2 sd:19"*^3 kw:19"*^1) )
>>>
>>>
>>> That won't work, so I wanted to escape the quotes.    The docs  
>>> said to use a
>>> backslash.  So I'm doing this:
>>>
>>> luceneQuery.toString().replaceAll("\"", "\\\\\"")
>>
>> It should be " => \",
>> so shouldn't that be luceneQuery.toString().replaceAll("\"", "\\\"")
>>
>>> And the result is:
>>>
>>> Line 2:  +( (name:19\"*^4 ld:19\"*^2 sd:19\"*^3 kw:19\"*^1) )
>>>
>>> This looks right.  Then I put it into a  
>>> MultiFieldQueryParser.parse() via:
>>>
>>>             query =
>>> MultiFieldQueryParser.parse(luceneQuery.toString().replaceAll("\"",
>>> "\\\\\""), Indexer.CARTABLE, Indexer.analyzer);
>>>
>>>
>>> and I get:
>>>
>>> Parsed query: +(name:19"*^4.0 ld:19"*^2.0 sd:19"*^3.0 kw:19"*)
>>
>> This looks right.
>>
>>> Which of course, won't work.   How can I get the escapes to  
>>> survive through
>>> the parsing?
>>
>> The escapes are for parsing.  You don't want them to survive parsing
>> unless you also indexed them with escapes.
>> Perhaps what is causing your confusion is that Query.toString()
>> doesn't currently re-escape special characters.
>>
>> -Yonik
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: ParseQuery with quotes

Posted by John Powers <jp...@configureone.com>.

Ok, I understand the .toString() part.

But, if I have some 19" in the text of these items, and I do a search with
19", that has been escaped before parsing....why am I not getting anything?
The indexer analyzer took them out?   So then to find these documents, I
would want to either change the analyzer or take the quote out of the query?
Correct?


On 12/20/05 12:07 PM, "Yonik Seeley" <ys...@gmail.com> wrote:

> On 12/20/05, John Powers <jp...@configureone.com> wrote:
>> I would like to be able to search for 19 inches with the quote.  So I get a
>> query like this:
>> Line 1:  +( (name:19"*^4 ld:19"*^2 sd:19"*^3 kw:19"*^1) )
>> 
>> 
>> That won't work, so I wanted to escape the quotes.    The docs said to use a
>> backslash.  So I'm doing this:
>> 
>> luceneQuery.toString().replaceAll("\"", "\\\\\"")
> 
> It should be " => \",
> so shouldn't that be luceneQuery.toString().replaceAll("\"", "\\\"")
> 
>> And the result is:
>> 
>> Line 2:  +( (name:19\"*^4 ld:19\"*^2 sd:19\"*^3 kw:19\"*^1) )
>> 
>> This looks right.  Then I put it into a MultiFieldQueryParser.parse() via:
>> 
>>             query =
>> MultiFieldQueryParser.parse(luceneQuery.toString().replaceAll("\"",
>> "\\\\\""), Indexer.CARTABLE, Indexer.analyzer);
>> 
>> 
>> and I get:
>> 
>> Parsed query: +(name:19"*^4.0 ld:19"*^2.0 sd:19"*^3.0 kw:19"*)
> 
> This looks right.
> 
>> Which of course, won't work.   How can I get the escapes to survive through
>> the parsing?
> 
> The escapes are for parsing.  You don't want them to survive parsing
> unless you also indexed them with escapes.
> Perhaps what is causing your confusion is that Query.toString()
> doesn't currently re-escape special characters.
> 
> -Yonik
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: ParseQuery with quotes

Posted by Yonik Seeley <ys...@gmail.com>.

On 12/20/05, John Powers <jp...@configureone.com> wrote:
> I would like to be able to search for 19 inches with the quote.  So I get a
> query like this:
> Line 1:  +( (name:19"*^4 ld:19"*^2 sd:19"*^3 kw:19"*^1) )
>
>
> That won't work, so I wanted to escape the quotes.    The docs said to use a
> backslash.  So I'm doing this:
>
> luceneQuery.toString().replaceAll("\"", "\\\\\"")

It should be " => \",
so shouldn't that be luceneQuery.toString().replaceAll("\"", "\\\"")

> And the result is:
>
> Line 2:  +( (name:19\"*^4 ld:19\"*^2 sd:19\"*^3 kw:19\"*^1) )
>
> This looks right.  Then I put it into a MultiFieldQueryParser.parse() via:
>
>             query =
> MultiFieldQueryParser.parse(luceneQuery.toString().replaceAll("\"",
> "\\\\\""), Indexer.CARTABLE, Indexer.analyzer);
>
>
> and I get:
>
> Parsed query: +(name:19"*^4.0 ld:19"*^2.0 sd:19"*^3.0 kw:19"*)

This looks right.

> Which of course, won't work.   How can I get the escapes to survive through
> the parsing?

The escapes are for parsing.  You don't want them to survive parsing
unless you also indexed them with escapes.
Perhaps what is causing your confusion is that Query.toString()
doesn't currently re-escape special characters.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

ParseQuery with quotes

Posted by John Powers <jp...@configureone.com>.

I would like to be able to search for 19 inches with the quote.  So I get a
query like this:
Line 1:  +( (name:19"*^4 ld:19"*^2 sd:19"*^3 kw:19"*^1) )


That won't work, so I wanted to escape the quotes.    The docs said to use a
backslash.  So I'm doing this:

luceneQuery.toString().replaceAll("\"", "\\\\\"")

And the result is:

Line 2:  +( (name:19\"*^4 ld:19\"*^2 sd:19\"*^3 kw:19\"*^1) )

This looks right.  Then I put it into a MultiFieldQueryParser.parse() via:

            query =
MultiFieldQueryParser.parse(luceneQuery.toString().replaceAll("\"",
"\\\\\""), Indexer.CARTABLE, Indexer.analyzer);


and I get:

Parsed query: +(name:19"*^4.0 ld:19"*^2.0 sd:19"*^3.0 kw:19"*)

Which of course, won't work.   How can I get the escapes to survive through
the parsing?

Line two looks like what I want right before it gets parsed.




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org