You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by sham singh <sh...@gmail.com> on 2011/04/05 23:53:03 UTC
word + ngram tokenization
Hi All,
I have to do tokenization which is combination of NGram and Standard
tokenization
for ex if the content is :"the quick brown fox jumped over the lazy dog"
requirement is to tokenize into:
quick brown fox
brown fox jumped
fox jumped over etc
..
..
Please help me to find out best analyzer for my requirement
Thanks in Advance
--
Many Thanks,
Shambhu
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: word + ngram tokenization
Posted by Steven A Rowe <sa...@syr.edu>.
Hi Shambhu,
ShingleFilter will construct word n-grams:
http://lucene.apache.org/java/3_1_0/api/contrib-analyzers/org/apache/lucene/analysis/shingle/ShingleFilter.html
Steve
> -----Original Message-----
> From: sham singh [mailto:shamsingh4u@gmail.com]
> Sent: Tuesday, April 05, 2011 5:53 PM
> To: java-user@lucene.apache.org
> Subject: word + ngram tokenization
>
> Hi All,
>
> I have to do tokenization which is combination of NGram and Standard
> tokenization
> for ex if the content is :"the quick brown fox jumped over the lazy dog"
> requirement is to tokenize into:
> quick brown fox
> brown fox jumped
> fox jumped over etc
> ..
> ..
>
> Please help me to find out best analyzer for my requirement
>
> Thanks in Advance
>
> --
> Many Thanks,
> Shambhu
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org