You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by ma...@yahoo.co.uk on 2004/07/26 22:58:29 UTC

Highlighter package updated with overlapping token support

I have updated the Highlighter code in CVS to support tokenizers that generate overlapping tokens.

The Junit test rig has a new example test that uses a "SynonymTokenizer" which generates multiple tokens 
in the same position for the same input token eg (the token "football" is expanded into tokens "soccer","footie" and "football"). 
The Formatter interface had to be changed to take a new "TokenGroup" object instead of a single token but
I doubt any code changes in clients are required because most people use the default Formatter implementation and haven't
created their own  implementations.

Cheers
Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Highlighter package updated with overlapping token support

Posted by Karthik N S <ka...@controlnet.co.in>.
Hi
   Mark

 Apologies....


      Please   Casn u Provide the URL for the Users to Dwnload the new
version
     of Highlighter package ( jar / Zip  format) from u'r main website page.

     [ Because some of the developers may not have access to
     CVS downloading (Organization restrictions) from Lucene - sandbox ]



Thx in advance

with regards
Karthik

-----Original Message-----
From: markharw00d@yahoo.co.uk [mailto:markharw00d@yahoo.co.uk]
Sent: Tuesday, July 27, 2004 2:28 AM
To: lucene-user@jakarta.apache.org
Subject: Highlighter package updated with overlapping token support


I have updated the Highlighter code in CVS to support tokenizers that
generate overlapping tokens.

The Junit test rig has a new example test that uses a "SynonymTokenizer"
which generates multiple tokens
in the same position for the same input token eg (the token "football" is
expanded into tokens "soccer","footie" and "football").
The Formatter interface had to be changed to take a new "TokenGroup" object
instead of a single token but
I doubt any code changes in clients are required because most people use the
default Formatter implementation and haven't
created their own  implementations.

Cheers
Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org