You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by bu...@apache.org on 2005/06/21 23:08:38 UTC
DO NOT REPLY [Bug 35456] New: -
NGramFilter -- construct n-grams from a TokenStream
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=35456>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=35456
Summary: NGramFilter -- construct n-grams from a TokenStream
Product: Lucene
Version: unspecified
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Analysis
AssignedTo: lucene-dev@jakarta.apache.org
ReportedBy: apache-bugzilla@sebastian-kirsch.org
This filter constructs n-grams (token combinations up to a fixed size, sometimes
called "shingles") from a token stream.
The filter sets start offsets, end offsets and position increments, so
highlighting and phrase queries should work.
Position increments > 1 in the input stream are replaced by filler tokens
(tokens with termText "_" and endOffset - startOffset = 0) in the output
n-grams. (Position increments > 1 in the input stream are usually caused by
removing some tokens, eg. stopwords, from a stream.)
The filter uses CircularFifoBuffer and UnboundedFifoBuffer from Apache
Commons-Collections.
Filter, test case and an analyzer are attached.
--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org