You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ramy Hardan <ja...@hardan.de> on 2004/02/07 23:32:35 UTC

Search Refinement Approaches

Hi,

Reviewing javadocs and previous posts, search refinement or 'search
within search' is best done with a Filter. To fill the Filter's BitSet
with the results of a search, a HitCollector is the obvious solution.
Unfortunately when using HitCollector I have to implement all the
functionality the Hits class usually provides myself.

Is there an efficient way to search refinement preferably without
losing the Hits class? I can think of the following approaches:

- Don't use Hits: collect all scores and document numbers with a
  HitCollector and sort them by score after the search. Retrieve the
  needed documents from IndexReader via document number.
- Use Hits: Briefly examining the source reveals this possiblilty:
  subclass BitSet and override the boolean get(int bitIndex) method to
  additionally set the bit at bitIndex in another BitSet. Use this
  subclass in a Filter and initialize it with all ones (in the first
  search). This way I can tell which documents are tested by the
  IndexSearcher against the Filter by examining the second BitSet and
  use it as a Filter for the refining search. Here's a scetch of this
  for clarification:

  public class FilterBitSet extends BitSet {
    private BitSet bitsForRefiningFilter;

    public boolean get( int bitIndex ) {
      boolean result = super.get( bitIndex );
      if (result) bitsForRefiningFilter.set( bitIndex );
      return result;
    }
  }

  Is this really possible? (might be more of a question for dev)

Last question about document numbers:
When and how exactly do they change? The javadoc states they change
upon addition and deletion. May I assume that a particular document
number is stable as long as it is not changed (deleted and added)
although other documents are added/deleted and optimize() is NOT
called? If yes, is this about to change in the foreseeable future?

Thanks in advance

Ramy

  



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Search Refinement Approaches

Posted by Ramy Hardan <ja...@hardan.de>.
Sunday, February 8, 2004, 4:19:59 AM, Erik Hatcher wrote:

> On Feb 7, 2004, at 5:32 PM, Ramy Hardan wrote:
>> Is there an efficient way to search refinement preferably without
>> losing the Hits class?

> I'm not quite following your Filter questions, but QueryFilter seems to
> fit the bill for what you are trying to do.  Just keep around the 
> previous query, and filter on it for successive searches.

First, thanks for your answer. Basically QueryFilter provides what I
need. But isn't the search actually executed twice, once for
retrieving the Hits and once for creating the QueryFilter instance if
refinement is needed afterwards? This is what I try to prevent. I see
that for the same query and unmodified index I can reuse a queryFilter
but this is quite unlikely in my scenario. Additionaly QueryFilter
doesn't seem to be ready for multiple refinement steps (like searching
for printers - HP - Laser - more than 16 ppm).

I'll try to implement different approaches, profile them and come back
with some evidence rather than bothering you with my speculations.
Thanks

Ramy


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Search Refinement Approaches

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Feb 7, 2004, at 5:32 PM, Ramy Hardan wrote:
> Is there an efficient way to search refinement preferably without
> losing the Hits class?

I'm not quite following your Filter questions, but QueryFilter seems to 
fit the bill for what you are trying to do.  Just keep around the 
previous query, and filter on it for successive searches.

> Last question about document numbers:
> When and how exactly do they change? The javadoc states they change
> upon addition and deletion. May I assume that a particular document
> number is stable as long as it is not changed (deleted and added)
> although other documents are added/deleted and optimize() is NOT
> called? If yes, is this about to change in the foreseeable future?

Document numbers change when a hole has been made by a delete and the 
index is optimized.  So, I think your assumption is fine, but 
personally I'm weary of relying on something potentially transient.  
Perhaps there is another way to accomplish what you are after?  A 
TermQuery is very fast, so maybe that could get you directly to a 
document of interest instead?

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Search Refinement Approaches

Posted by Ramy Hardan <ja...@hardan.de>.
Hello Dror,

Sunday, February 8, 2004, 7:35:32 PM, you wrote:

> Hi Ramy,

> Maybe I'm misunderstand the question but wouldn't creating a 
> that ANDs the original query and the new one do what you want?

You are absolutely right. This approach would yield the desired
result. But what I'm concerned about is to find the most performant
way of achieving this. I'm afraid that queries become slower as they
get more complex (the refinement levels increase). The bottom line
probably is that I have to implement the different strategies and
compare their performance.

Thanks and best regards

Ramy

> On Sat, Feb 07, 2004 at 11:32:35PM +0100, Ramy Hardan wrote:
>> Hi,
>> 
>> Reviewing javadocs and previous posts, search refinement or 'search
>> within search' is best done with a Filter. To fill the Filter's BitSet
>> with the results of a search, a HitCollector is the obvious solution.
>> Unfortunately when using HitCollector I have to implement all the
>> functionality the Hits class usually provides myself.
>> 
>> Is there an efficient way to search refinement preferably without
>> losing the Hits class? I can think of the following approaches:
>> 
>> - Don't use Hits: collect all scores and document numbers with a
>>   HitCollector and sort them by score after the search. Retrieve the
>>   needed documents from IndexReader via document number.
>> - Use Hits: Briefly examining the source reveals this possiblilty:
>>   subclass BitSet and override the boolean get(int bitIndex) method to
>>   additionally set the bit at bitIndex in another BitSet. Use this
>>   subclass in a Filter and initialize it with all ones (in the first
>>   search). This way I can tell which documents are tested by the
>>   IndexSearcher against the Filter by examining the second BitSet and
>>   use it as a Filter for the refining search. Here's a scetch of this
>>   for clarification:
>> 
>>   public class FilterBitSet extends BitSet {
>>     private BitSet bitsForRefiningFilter;
>> 
>>     public boolean get( int bitIndex ) {
>>       boolean result = super.get( bitIndex );
>>       if (result) bitsForRefiningFilter.set( bitIndex );
>>       return result;
>>     }
>>   }
>> 
>>   Is this really possible? (might be more of a question for dev)




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Search Refinement Approaches

Posted by Dror Matalon <dr...@zapatec.com>.
Hi Ramy,

Maybe I'm misunderstand the question but wouldn't creating a 
that ANDs the original query and the new one do what you want?

so if the original query was
foo bar

and the refinment is 
blah

create a new query that does:

(foo bar) AND (bar)

Seems a lot easier but maybe I'm missing something.

Regards,

Dror

On Sat, Feb 07, 2004 at 11:32:35PM +0100, Ramy Hardan wrote:
> Hi,
> 
> Reviewing javadocs and previous posts, search refinement or 'search
> within search' is best done with a Filter. To fill the Filter's BitSet
> with the results of a search, a HitCollector is the obvious solution.
> Unfortunately when using HitCollector I have to implement all the
> functionality the Hits class usually provides myself.
> 
> Is there an efficient way to search refinement preferably without
> losing the Hits class? I can think of the following approaches:
> 
> - Don't use Hits: collect all scores and document numbers with a
>   HitCollector and sort them by score after the search. Retrieve the
>   needed documents from IndexReader via document number.
> - Use Hits: Briefly examining the source reveals this possiblilty:
>   subclass BitSet and override the boolean get(int bitIndex) method to
>   additionally set the bit at bitIndex in another BitSet. Use this
>   subclass in a Filter and initialize it with all ones (in the first
>   search). This way I can tell which documents are tested by the
>   IndexSearcher against the Filter by examining the second BitSet and
>   use it as a Filter for the refining search. Here's a scetch of this
>   for clarification:
> 
>   public class FilterBitSet extends BitSet {
>     private BitSet bitsForRefiningFilter;
> 
>     public boolean get( int bitIndex ) {
>       boolean result = super.get( bitIndex );
>       if (result) bitsForRefiningFilter.set( bitIndex );
>       return result;
>     }
>   }
> 
>   Is this really possible? (might be more of a question for dev)
> 
> Last question about document numbers:
> When and how exactly do they change? The javadoc states they change
> upon addition and deletion. May I assume that a particular document
> number is stable as long as it is not changed (deleted and added)
> although other documents are added/deleted and optimize() is NOT
> called? If yes, is this about to change in the foreseeable future?
> 
> Thanks in advance
> 
> Ramy
> 
>   
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org