You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Xu, Ningfeng" <nf...@yahoo.com> on 2013/01/17 05:23:08 UTC

Access the whole TokenStream not just one by one in a TokenFilter-like processor

Hi,

I can write a CharFilter, similar as the org.apache.solr.analysis.PatternReplaceCharFilter, in which I can access the full content of CharStream. This is important since I might want to do pattern match on the content, then based on the match, I might modify the part of the full content before tokenization.

Similarly, how to write a TokenFilter-like processor which can access the TokenStream in a whole instead of one by one as in TokenFilter.incrementToken()? This might be necessary since I might also want to do pattern matching on the TokenStream.

Here TokenFilter-like means, I intend to put this processor in schema.xml at the same place as of a TokenFilter.


Any ideas? Many thanks.