You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by sham singh <sh...@gmail.com> on 2011/04/05 23:53:03 UTC

word + ngram tokenization

Hi All,

I have to do tokenization which is combination of NGram and Standard
tokenization
for ex if the content is  :"the quick brown fox jumped over the lazy dog"
requirement is to tokenize into:
quick brown fox
brown fox jumped
fox jumped over etc
..
..

Please help me to find out best analyzer for my requirement

Thanks in Advance

-- 
Many Thanks,
Shambhu

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: word + ngram tokenization

Posted by Steven A Rowe <sa...@syr.edu>.
Hi Shambhu,

ShingleFilter will construct word n-grams:

http://lucene.apache.org/java/3_1_0/api/contrib-analyzers/org/apache/lucene/analysis/shingle/ShingleFilter.html

Steve

> -----Original Message-----
> From: sham singh [mailto:shamsingh4u@gmail.com]
> Sent: Tuesday, April 05, 2011 5:53 PM
> To: java-user@lucene.apache.org
> Subject: word + ngram tokenization
> 
> Hi All,
> 
> I have to do tokenization which is combination of NGram and Standard
> tokenization
> for ex if the content is  :"the quick brown fox jumped over the lazy dog"
> requirement is to tokenize into:
> quick brown fox
> brown fox jumped
> fox jumped over etc
> ..
> ..
> 
> Please help me to find out best analyzer for my requirement
> 
> Thanks in Advance
> 
> --
> Many Thanks,
> Shambhu
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org