You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by ma...@yahoo.co.uk on 2004/07/26 22:58:29 UTC
Highlighter package updated with overlapping token support
I have updated the Highlighter code in CVS to support tokenizers that generate overlapping tokens.
The Junit test rig has a new example test that uses a "SynonymTokenizer" which generates multiple tokens
in the same position for the same input token eg (the token "football" is expanded into tokens "soccer","footie" and "football").
The Formatter interface had to be changed to take a new "TokenGroup" object instead of a single token but
I doubt any code changes in clients are required because most people use the default Formatter implementation and haven't
created their own implementations.
Cheers
Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Highlighter package updated with overlapping token support
Posted by Karthik N S <ka...@controlnet.co.in>.
Hi
Mark
Apologies....
Please Casn u Provide the URL for the Users to Dwnload the new
version
of Highlighter package ( jar / Zip format) from u'r main website page.
[ Because some of the developers may not have access to
CVS downloading (Organization restrictions) from Lucene - sandbox ]
Thx in advance
with regards
Karthik
-----Original Message-----
From: markharw00d@yahoo.co.uk [mailto:markharw00d@yahoo.co.uk]
Sent: Tuesday, July 27, 2004 2:28 AM
To: lucene-user@jakarta.apache.org
Subject: Highlighter package updated with overlapping token support
I have updated the Highlighter code in CVS to support tokenizers that
generate overlapping tokens.
The Junit test rig has a new example test that uses a "SynonymTokenizer"
which generates multiple tokens
in the same position for the same input token eg (the token "football" is
expanded into tokens "soccer","footie" and "football").
The Formatter interface had to be changed to take a new "TokenGroup" object
instead of a single token but
I doubt any code changes in clients are required because most people use the
default Formatter implementation and haven't
created their own implementations.
Cheers
Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org