You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by bjornbear <bj...@gmail.com> on 2011/04/12 16:06:49 UTC

Analysing all tokens in a stream

Hi

I would like to build a component that during indexing analyses all tokens
in a stream and adds metadata to a new field based on my analysis. I have
different tasks that I would like to perform, like basic classification and
certain more advanced phrase detections. How would I do this? A normal
TokenFilter can only look at one token a time, but I need to access a larger
context.

I've noticed that there is a TeeSinkTokenFilter that might be useful in
someway since "It is also useful for doing things like entity extraction or
proper noun analysis", but I don't understand how.

Can someone help me with some super-simple stub or similar? What I'm looking
for is something like:

class MySmartFilter  {

  public AnalyzeTokens(tokenList)
 {
       metadataTokens = DoTheAnalysis(tokenList);
       AddToField("metadata", metadataTokens);
 }
}

Any help is much appreciated!
Thanks
/Bjorn

--
View this message in context: http://lucene.472066.n3.nabble.com/Analysing-all-tokens-in-a-stream-tp2811516p2811516.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Analysing all tokens in a stream

Posted by Ahmet Arslan <io...@yahoo.com>.
> I would like to build a component that during indexing
> analyses all tokens
> in a stream and adds metadata to a new field based on my
> analysis. I have
> different tasks that I would like to perform, like basic
> classification and
> certain more advanced phrase detections. How would I do
> this? A normal
> TokenFilter can only look at one token a time, but I need
> to access a larger
> context.
> 
> I've noticed that there is a TeeSinkTokenFilter that might
> be useful in
> someway since "It is also useful for doing things like
> entity extraction or
> proper noun analysis", but I don't understand how.
> 
> Can someone help me with some super-simple stub or similar?
> What I'm looking
> for is something like:
> 
> class MySmartFilter  {
> 
>   public AnalyzeTokens(tokenList)
>  {
>        metadataTokens =
> DoTheAnalysis(tokenList);
>        AddToField("metadata",
> metadataTokens);
>  }
> }
> 

http://wiki.apache.org/solr/UpdateRequestProcessor may help you.
http://wiki.apache.org/solr/SolrUIMA can be an example.