You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Michael Jakl <ja...@gmail.com> on 2012/07/04 15:29:56 UTC

Get all matching terms of an OR query

Hi,
is there an easy way to get the matches of an OR query?

If I'm searching for "android OR google OR apple OR iphone OR -ipod",
I'd like to know which of these terms document X contains.

I've been using debugQuery and tried to extract the info from the
explain information, unfortunately this is too slow and I'm having
troubles with the stemming of the query.

Using the highlight component doesn't work either because my fields
aren't stored (would the highlighter work with stemmed texts?)

We're using Solr 3.6 in a distributed setting. I'd like to prevent
storing the texts because of space issues, but if that's the only
reasonable solution... .

Thank you,
Michael

Re: Get all matching terms of an OR query

Posted by Michael Jakl <ja...@gmail.com>.
Thank you!

On 4 July 2012 17:37, Jack Krupansky <ja...@basetechnology.com> wrote:
> What exactly is it that is too slow?

I was comparing Queries with "debugQuery" enabled and disabled. The
difference was 60 seconds to 30 seconds for some (unusual) large
Queries (many Terms over a large set of documents chosen by filter
queries). After the caches are warm, the performance is of course far
better.

> It would be nice to have an optional search component or query parser option
> that returned the analyzed term for each query term.

Yes, I was thinking of reusing the analysis.jsp for that task, but
couldn't see an easy way to handle phrase queries and wasn't sure if
it performs better than the debugQuery approach.

> But as things stand, I would suggest that you do your own "fuzzy match"
> between the debugQuery terms and your source terms. That may not be 100%
> accurate, but probably would cover most/many cases.

Thanks, that's reassuring :)

Cheers,
Michael

Re: Get all matching terms of an OR query

Posted by Jack Krupansky <ja...@basetechnology.com>.
You could always do a custom search component, but all the same information 
(which terms matched) is in the debugQuery. For example, 
"queryWeight(text:the)" indicates that "the" appears in the document.

What exactly is it that is too slow?

Yes, you do have to accept that explain uses analyzed terms. I would note 
that you could try to correlate the "parsedquery" with the original query 
since the parsed query will contain stemmed terms.

It would be nice to have an optional search component or query parser option 
that returned the analyzed term for each query term.

But as things stand, I would suggest that you do your own "fuzzy match" 
between the debugQuery terms and your source terms. That may not be 100% 
accurate, but probably would cover most/many cases.

-- Jack Krupansky

-----Original Message----- 
From: Michael Jakl
Sent: Wednesday, July 04, 2012 10:09 AM
To: solr-user@lucene.apache.org
Subject: Re: Get all matching terms of an OR query

Hi!

On 4 July 2012 17:01, Jack Krupansky <ja...@basetechnology.com> wrote:
> First, "OR -ipod" needs to be written as "OR (*:* -ipod)" due to an 
> ongoing
> deficiency in Lucene query parsing, but I wonder what you really think you
> are OR'ing in that clause - all documents that don't contain "ipod"? That
> seems odd. Maybe you really want to constrain the preceding query to 
> exclude
> ipod? That would be:
>
> (android OR google OR apple OR iphone) -ipod

Thanks, the example was ill-chosen, the -ipod part shouldn't be there.

After some more tests and research, using the debugQuery method seems
the only viable solution(?)

Cheers,
Michael 


Re: Get all matching terms of an OR query

Posted by Michael Jakl <ja...@gmail.com>.
Hi!

On 4 July 2012 17:01, Jack Krupansky <ja...@basetechnology.com> wrote:
> First, "OR -ipod" needs to be written as "OR (*:* -ipod)" due to an ongoing
> deficiency in Lucene query parsing, but I wonder what you really think you
> are OR'ing in that clause - all documents that don't contain "ipod"? That
> seems odd. Maybe you really want to constrain the preceding query to exclude
> ipod? That would be:
>
> (android OR google OR apple OR iphone) -ipod

Thanks, the example was ill-chosen, the -ipod part shouldn't be there.

After some more tests and research, using the debugQuery method seems
the only viable solution(?)

Cheers,
Michael

Re: Get all matching terms of an OR query

Posted by Jack Krupansky <ja...@basetechnology.com>.
First, "OR -ipod" needs to be written as "OR (*:* -ipod)" due to an ongoing 
deficiency in Lucene query parsing, but I wonder what you really think you 
are OR'ing in that clause - all documents that don't contain "ipod"? That 
seems odd. Maybe you really want to constrain the preceding query to exclude 
ipod? That would be:

(android OR google OR apple OR iphone) -ipod

-- Jack Krupansky

-----Original Message----- 
From: Michael Jakl
Sent: Wednesday, July 04, 2012 8:29 AM
To: solr-user@lucene.apache.org
Subject: Get all matching terms of an OR query

Hi,
is there an easy way to get the matches of an OR query?

If I'm searching for "android OR google OR apple OR iphone OR -ipod",
I'd like to know which of these terms document X contains.

I've been using debugQuery and tried to extract the info from the
explain information, unfortunately this is too slow and I'm having
troubles with the stemming of the query.

Using the highlight component doesn't work either because my fields
aren't stored (would the highlighter work with stemmed texts?)

We're using Solr 3.6 in a distributed setting. I'd like to prevent
storing the texts because of space issues, but if that's the only
reasonable solution... .

Thank you,
Michael