You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2003/10/02 23:15:05 UTC

Query Term Collector (was: Re: New highlighter package available)

Korfut, Tatu (if you're watching),

I'm trying to understand what this term collector idea is all about, so
I looked online for some of your previous discussions on this topic
from March 2003.  So this patches that both of you sent to lucene-dev
at some point both implement a term collector.
What terms do your term collectors collect, could you explain that in
simple terms, and with an example, please? (I almost broke one of the
walls in my apartment, when I accidentally smacked it with my head 10
minutes ago)

If I make a BooleanQuery: Laurel AND Hardy
What is I make a WildcardQuery: Comed*
What terms would your collector collect and return in each case?

I only saw Tatu's diff to the existing classes, and noticed that his
solution includes 5-6 new classes.
(http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg02815/querySrc.zip)

Thanks,
Otis




--- none none <ko...@lycos.com> wrote:
> Hi Otis,
> as Tatu' explained (sorry i am pretty busy at work,  thank you
> Tatu'!)
> we only ask for "Support of Term Collector" and this needs some
> changes in the core, changes are in a previous email i sent to the
> list (can do it again), it is like a patch, doing that it will be
> easier *for us* to provide highlight when a new version of Lucene
> comes out.
> As for Mark works of the highlighter, it is not working with release
> 1.3, due to big changes in the core, query rewrite, termenum, etc.
> As tatu said, there can be a waste of resource for users that do not
> need term collector, so a boolean value will avoid that, by default
> we can set it to TERM_COLLECTOR_OFF. 
> I had to go through all the lucene code (almost) to make it work in
> 1.3.
> that's all.
> thanks.
> 
> Korfut.



__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Query Term Collector (was: Re: New highlighter package available)

Posted by Tatu Saloranta <ta...@hypermall.net>.
On Thursday 02 October 2003 15:15, Otis Gospodnetic wrote:
> Korfut, Tatu (if you're watching),
>
> I'm trying to understand what this term collector idea is all about, so
> I looked online for some of your previous discussions on this topic
> from March 2003.  So this patches that both of you sent to lucene-dev
> at some point both implement a term collector.
> What terms do your term collectors collect, could you explain that in
> simple terms, and with an example, please? (I almost broke one of the
> walls in my apartment, when I accidentally smacked it with my head 10
> minutes ago)
>
> If I make a BooleanQuery: Laurel AND Hardy
> What is I make a WildcardQuery: Comed*
> What terms would your collector collect and return in each case?
>
> I only saw Tatu's diff to the existing classes, and noticed that his
> solution includes 5-6 new classes.
> (http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg02815/querySr
>c.zip)

Right. New classes were for iterating over terms; changes to Lucene core were
fairly minor (mostly to allow using visitor pattern to traverse query 
structure and to call term collector).

In my code I created 3 iterator interfaces that allow traversing query 
structure, as well as both "logical" terms (ones Query has, before expansion 
or rewrite), and "physical" terms; terms that actually exist in index 
(expanded query terms). For some queries (simple term query, phrase query) 
there is no difference; for others there are (wild card queries). Now, if I 
remember correctly, iteration happens at 2 level; main level being queries, 
and then for each query, one can iterate through its terms. This is necessary 
to do highlightings where different original query components (different 
phrases etc) need to be handled differently (different colouring etc).
My main goal was to try to show uniform interface, so that application code 
didn't have to worry about different kinds of queries, while still having 
full power to check out Query objects if it chooses to.

So, at high level the idea was something like:

QueryIterator qi = query.iterator();
while (qi.hasNext()) {
// Loop over separate "term query" (queries that directly contain terms)
   Query q = qi.nextQuery(); // If query itself is needed?
  // Need to know properties of the clause that contains term? (for - / +)
  boolean optional = qi.isOptional();
  boolean reqd = qi.isRequired();
  boolean prohib = qi.isProhibited();
 // Iterate over logical (base) terms:
  TermIterator logicalTermIt = qi.baseTermIterator();
  while (logicalTermIt.hasNext()) {
     Term t = logicalTermIt.nextTerm(); // Could display original terms etc
  }
  // Need an IndexReader, to expand Terms
  TermIterator actualTermIt = qi.actualTermIterator(indexReader);
  while (actualTermIt.hasNext()) {
    // ... Collect all actual terms to match in doc displayed?
  }
}

Example probably doesn't make much sense alone, but code would allow for
actual highlighting support, as well as fairly generic access to queries, if 
one wants display query structure or such.

I haven't really had time since then (nor immediate need) to work on getting 
support for highlighting, so I'm not really pushing my patches, if/when 
others have more current ones... but if anyone's interested in such 
approaches, I have the patch zip file (plus archives likely have it).

I also didn't realize back then that query rewrite (which I believe did exist 
even then) could be used to simplify the task... interesting approach, and 
I'm glad it works well enough to allow for highlighting to work. I haven't 
checked out Mark's patches but I think it's great someone took time to 
implement this often requested feature.

-+ Tatu +-


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org