You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Edwin Lee <ed...@hotmail.com> on 2008/04/19 10:42:40 UTC

How to Retrieve Found Term?

Hi all,

i'm using Lucene 2.3.1. What i'm trying to do seems straightforward enough (to me), but i just can't find the method to do so.

Let's say i'm doing a PhraseQuery of the phrase "apples and oranges" with a non-zero slop value, and it returns, e.g., 20 Hits. Because i'm using non-zero slop value, the phrase that actually gets found could be something like "oranges and apples" instead. i would like to find out, for each of the Hit returned, what is the actual term from the document that was found. How can i do that?



Thanks in advance,
Edwin

_________________________________________________________________
Help Splitzo Sally Before It’s Too Late! 
http://www.thegirlwhosplitinto5.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: How to Retrieve Found Term?

Posted by Edwin Lee <ed...@hotmail.com>.
Hi Karl,

Thanks for the suggestions, i would be glad to contribute back to the project.

i'm not too familiar with the inner workings of Lucene though; how does such a functionality feature in a Query implementation?

My naive interpretation, when i first got hold of Lucene, is that Query is what you want to search, then the Searcher would do the work, against the Directory which contains the indexed materials, producing Hits which contains search results. So my initial (also naive) thought was that Hit would contain what i wanted (the found term).



Regards,
Edwin



> Date: Tue, 22 Apr 2008 11:57:03 +0200
> From: karl.wettin@gmail.com
> To: java-user@lucene.apache.org
> Subject: Re: How to Retrieve Found Term?
> 
> I can think of  two ways to get your hands on this information, simplest
> one beeing you creating a filter with the documents that mached your
> original query and then place new queries on the index with slop, non
> slop, et c to find out whats what. This will of couse be very expensive
> and is thus only an interesting solution for you if the response time is
> good enough.
> 
> The second and cleaner solution is to create your custom Query(ies) or
> hack the Weight of the queries you are currently using and have them
> store this information when matching. A good implementation of this
> would probably be a welcomed contribution to the project.


_________________________________________________________________
Get your free suite of Windows Live services today!
http://www.get.live.com/wl/all

Re: How to Retrieve Found Term?

Posted by Karl Wettin <ka...@gmail.com>.
I can think of  two ways to get your hands on this information, simplest
one beeing you creating a filter with the documents that mached your
original query and then place new queries on the index with slop, non
slop, et c to find out whats what. This will of couse be very expensive
and is thus only an interesting solution for you if the response time is
good enough.

The second and cleaner solution is to create your custom Query(ies) or
hack the Weight of the queries you are currently using and have them
store this information when matching. A good implementation of this
would probably be a welcomed contribution to the project.

     karl



Edwin Lee skrev:
> Hi Karl,
> 
> Thanks for the response. i have looked at the Highlighter. Unfortunately, when i feed it with a PhraseQuery, it seems to break up the query into it's individual terms first, so does not yield the result that i would like. i have not looked at Searcher.explain yert though.
> 
> It's like this. Two types of searches are required: single term with wildcard, and phrase query with non-zero slop. But either could return large number of results, some of which are not what is wanted. So there needs to be an intermediate filter screen like this:
> apples and oranges 25 hits found
> oranges and apples 70 hits found
> and apples oranges 5 hits found
> ...
> 
> so that we can choose to not to display results which correspond to the found phrases that we are not interested in, and when we get to the display screen, it's just the results we want.
> 
> 
> 
> Thanks,
> Edwin
> 
> 
> 
> ----------------------------------------
>> Date: Sat, 19 Apr 2008 22:01:17 +0200
>> From: karl.wettin@gmail.com
>> To: java-user@lucene.apache.org
>> Subject: Re: How to Retrieve Found Term?
>>
>> Edwin Lee skrev:
>>> Hi all,
>>>
>>> i'm using Lucene 2.3.1. What i'm trying to do seems straightforward enough (to me), but i just can't find the method to do so.
>>>
>>> Let's say i'm doing a PhraseQuery of the phrase "apples and oranges" with a non-zero slop value, and it returns, e.g., 20 Hits. Because i'm using non-zero slop value, the phrase that actually gets found could be something like "oranges and apples" instead. i would like to find out, for each of the Hit returned, what is the actual term from the document that was found. How can i do that?
> 
>> There is no built in support in Lucene for that. You can take a look at
>> what Searcher.explain and the Highlighter does.
>>
>> If you tell us why you want to do this perhaps we can come up with an
>> alternative solution.
>>
>>             karl
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> 
> _________________________________________________________________
> Publish your photos to your Space easily with Photo Gallery.
> http://www.get.live.com/wl/all
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: How to Retrieve Found Term?

Posted by Edwin Lee <ed...@hotmail.com>.
Hi Karl,

Thanks for the response. i have looked at the Highlighter. Unfortunately, when i feed it with a PhraseQuery, it seems to break up the query into it's individual terms first, so does not yield the result that i would like. i have not looked at Searcher.explain yert though.

It's like this. Two types of searches are required: single term with wildcard, and phrase query with non-zero slop. But either could return large number of results, some of which are not what is wanted. So there needs to be an intermediate filter screen like this:
apples and oranges 25 hits found
oranges and apples 70 hits found
and apples oranges 5 hits found
...

so that we can choose to not to display results which correspond to the found phrases that we are not interested in, and when we get to the display screen, it's just the results we want.



Thanks,
Edwin



----------------------------------------
> Date: Sat, 19 Apr 2008 22:01:17 +0200
> From: karl.wettin@gmail.com
> To: java-user@lucene.apache.org
> Subject: Re: How to Retrieve Found Term?
> 
> Edwin Lee skrev:
>> Hi all,
>> 
>> i'm using Lucene 2.3.1. What i'm trying to do seems straightforward enough (to me), but i just can't find the method to do so.
>> 
>> Let's say i'm doing a PhraseQuery of the phrase "apples and oranges" with a non-zero slop value, and it returns, e.g., 20 Hits. Because i'm using non-zero slop value, the phrase that actually gets found could be something like "oranges and apples" instead. i would like to find out, for each of the Hit returned, what is the actual term from the document that was found. How can i do that?

> There is no built in support in Lucene for that. You can take a look at
> what Searcher.explain and the Highlighter does.
> 
> If you tell us why you want to do this perhaps we can come up with an
> alternative solution.
> 
>             karl
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

_________________________________________________________________
Publish your photos to your Space easily with Photo Gallery.
http://www.get.live.com/wl/all

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How to Retrieve Found Term?

Posted by Karl Wettin <ka...@gmail.com>.
Edwin Lee skrev:
> Hi all,
> 
> i'm using Lucene 2.3.1. What i'm trying to do seems straightforward enough (to me), but i just can't find the method to do so.
> 
> Let's say i'm doing a PhraseQuery of the phrase "apples and oranges" with a non-zero slop value, and it returns, e.g., 20 Hits. Because i'm using non-zero slop value, the phrase that actually gets found could be something like "oranges and apples" instead. i would like to find out, for each of the Hit returned, what is the actual term from the document that was found. How can i do that?
There is no built in support in Lucene for that. You can take a look at
what Searcher.explain and the Highlighter does.

If you tell us why you want to do this perhaps we can come up with an
alternative solution.

            karl














---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org