You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Priyanka Tufchi <pr...@launchship.com> on 2015/05/05 14:17:42 UTC

Finding Issue with NgramAnalyzer in Apache Lucene

Hi all

I am trying to use Apache Lucene  for Ngram Separator.


Reader reader = new StringReader("This is a test string");
NGramTokenizer gramTokenizer = new NGramTokenizer(reader, 1, 3);

CharTermAttribute charTermAttribute =
gramTokenizer.addAttribute(CharTermAttribute.class);
gramTokenizer.reset();

while (gramTokenizer.incrementToken()) {
    String token = charTermAttribute.toString();
    System.out.println(token);
}
gramTokenizer.end();
gramTokenizer.close();

}

This is the code  i used but it is returning character by character , I
want it to return in terms like this ,test , string, this test etc


===================
i tried with shringleFilter also , but it is giving nullpoint exception

*Reader reader = new StringReader("This is a test string");*
*   TokenStream tokenizer = new StandardTokenizer(Version.LUCENE_41,
reader);*
* tokenizer = new ShingleFilter(tokenizer, 2, 3);*
* CharTermAttribute charTermAttribute =
tokenizer.addAttribute(CharTermAttribute.class);*

* while (tokenizer.incrementToken()) {*
*    String token = charTermAttribute.toString();*
*    System.out.println(token);*

* }*

Plz guide


Thanks

-- 
Launchship Technology  respects your privacy. This email is intended only 
for the use of the party to which it is addressed and may contain 
information that is privileged, confidential, or protected by law. If you 
have received this message in error, or do not want to receive any further 
emails from us, please notify us immediately by replying to the message and 
deleting it from your computer.