You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Massimo Mannino <mm...@csc.com> on 2003/02/04 11:17:21 UTC

Schreiber's TermHighlighter/LuceneTools utility and the Demo...

From: Stone, Timothy
      Subject: Schreiber's TermHighlighter/LuceneTools utility and the
      Demo...
      Date: Wed, 11 Dec 2002 13:00:28 -0800

Has anyone successfully implemented Mark Schreiber's TermHighlighter
interface in the
demo? I'm seeking some pointers/samples.

Thanks in advance,
Tim


------------------------------------------------------------------



I work for CSC Italy and I' m developing a search engine for the intranet
of one of our customers based on Lucene.
I followed the steps described in de document that can be found on the site
http://www.iq-computing.de/. It works well. If anyone needs help or want to
see a sample just write to us.




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Schreiber's TermHighlighter/LuceneTools utility and the Demo...

Posted by Tatu Saloranta <ta...@hypermall.net>.
On Tuesday 04 February 2003 03:17, Massimo Mannino wrote:
> From: Stone, Timothy
>       Subject: Schreiber's TermHighlighter/LuceneTools utility and the
>       Demo...
>       Date: Wed, 11 Dec 2002 13:00:28 -0800
>
> Has anyone successfully implemented Mark Schreiber's TermHighlighter
> interface in the
> demo? I'm seeking some pointers/samples.

I was thinking of starting to implement it at some point, but haven't yet done 
it. However, reading through the sample code, I thought there are some things
that would be useful to add to Lucene Term class(es).

Most importantly, example traverses query hierarchy to get to terms used, by 
doing explicit casts and checks. This seems like a good place to use the
classic visitor pattern instead. So, Query.java should either have abstract
methods (to be implemented by deriving classes) to implement visitor
pattern (if I recall, something like "visitTerms(TermVisitor)" to start 
traversing,  and visit calls a method in TermVisitor for each term), or
alternatively (additionally), methods for getting Terms and sub-queries
contained. [my apologies if I description of visitor pattern is bit off... 
been a while since used it... but the main idea should be about right]

This would have to be included in core Lucene classes, but I think that would 
be a good (and simple to add) addition, to allow easily collecting all
Terms and/or sub-Queries given full query has.

After getting all terms, tokenizing should be easy, and one can create a data 
structure listing all spans that need highlighting. Then this has to be 
applied to the original un-tokenized content. Structure can (for example) 
just contain character ranges along with type of highlighting (different 
colours).
There are some potential tricky parts if HTML is to be highlighted; highlight 
markers still need to nest properly (if terms are not fully contained in same 
tag level), and some highlighted terms may not be visible (to resolve this, 
one needs to compare original text and analyzed text, to see which parts are 
removed by analyzer, but that match term(s)).
But even without getting it 100% correct, output should be fairly close.

-+ Tatu +-


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org