You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ralf Bierig <ra...@gmail.com> on 2015/02/04 13:45:11 UTC
Analyzer: Access to document?
Hi all,
an Analyzer has access to content on a per-field level by overwriting
this method:
protected TokenStreamComponents createComponents(String fieldName,
Reader reader);
Is it possible to get to the document? I want to collect the text
content from the entire document within my analyzer to be processed by
an external component.
Best,
Ralf
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Analyzer: Access to document?
Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi Ralf,
Does following code fragment work for you?
/**
* Modified from : http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/analysis/package-summary.html
*/
public List<String> getAnalyzedTokens(String text) throws IOException {
final List<String> list = new ArrayList<>();
try (TokenStream ts = analyzer().tokenStream("field", new StringReader(text))) {
final CharTermAttribute termAtt = ts.addAttribute(CharTermAttribute.class);
ts.reset(); // Resets this stream to the beginning. (Required)
while (ts.incrementToken())
list.add(termAtt.toString());
ts.end(); // Perform end-of-stream operations, e.g. set the final offset.
}
return list;
}
On Wednesday, February 4, 2015 2:45 PM, Ralf Bierig <ra...@gmail.com> wrote:
Hi all,
an Analyzer has access to content on a per-field level by overwriting
this method:
protected TokenStreamComponents createComponents(String fieldName,
Reader reader);
Is it possible to get to the document? I want to collect the text
content from the entire document within my analyzer to be processed by
an external component.
Best,
Ralf
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org