You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Itai Peleg <pe...@gmail.com> on 2013/01/01 22:24:41 UTC

Re: adding attributes to TokenStream

That worked great :) thanks a lot for the quick reply!

I have another question - after I "flagged" all my special tokens (in my
case, the ones that are entities) is there an elegant way of counting how
many of them I have in a document? I found an ugly way to do that, but I'm
sure there's a better one.

Thanks in advance,
Itai


2012/12/31 Michael Sokolov <so...@ifactory.com>

> On 12/31/2012 11:39 AM, Itai Peleg wrote:
>
>> Hi all,
>>
>> Can someone please post a simple example showing how to add additional
>> attributes to token in a TokenStream (inside IncrementToken for example?).
>>
>> I'm working on entity extraction and want to flag specific tokens an
>> entities, but I'm having problems.
>>
>> Thanks in advance,
>> Itai
>>
>>  Here's a simple example of a filter that adds an atytribute saying
> whether a token is "the"
>
> class YourTokenStream extends TokenFilter {
>   private final YourAttribute att;
>   private final CharTermAttribute term;
>   private final TokenStream source;
>
>   public YourTokenStream (TokenStream upstream) {
>      att = addAttribute (YourAttribute.class);
>      term = addAttribute (CharTermAttribute.class);
>      source = upstream;
>   }
>
>   public boolean incrementToken () {
>     if (source.incrementToken()) ?? {
>       if ("the".equals (new String(term.buffer())) {
>         att.setIsAnEnglishArticle(**true);
>         return true;
>     }
>     return false;
>   }
>
> }
>
>
>

Re: adding attributes to TokenStream

Posted by Michael Sokolov <so...@ifactory.com>.
Sure ... The frequency count is maintained in the index to enable 
relevance scoring.  You can pull it out using a TermDocs, which 
enumerates this sort of information.  Sorry, I don't have example code 
handy for this.

-Mike


On 1/1/2013 4:24 PM, Itai Peleg wrote:
> That worked great :) thanks a lot for the quick reply!
>
> I have another question - after I "flagged" all my special tokens (in my
> case, the ones that are entities) is there an elegant way of counting how
> many of them I have in a document? I found an ugly way to do that, but I'm
> sure there's a better one.
>
> Thanks in advance,
> Itai
>
>
> 2012/12/31 Michael Sokolov <so...@ifactory.com>
>
>> On 12/31/2012 11:39 AM, Itai Peleg wrote:
>>
>>> Hi all,
>>>
>>> Can someone please post a simple example showing how to add additional
>>> attributes to token in a TokenStream (inside IncrementToken for example?).
>>>
>>> I'm working on entity extraction and want to flag specific tokens an
>>> entities, but I'm having problems.
>>>
>>> Thanks in advance,
>>> Itai
>>>
>>>   Here's a simple example of a filter that adds an atytribute saying
>> whether a token is "the"
>>
>> class YourTokenStream extends TokenFilter {
>>    private final YourAttribute att;
>>    private final CharTermAttribute term;
>>    private final TokenStream source;
>>
>>    public YourTokenStream (TokenStream upstream) {
>>       att = addAttribute (YourAttribute.class);
>>       term = addAttribute (CharTermAttribute.class);
>>       source = upstream;
>>    }
>>
>>    public boolean incrementToken () {
>>      if (source.incrementToken()) ?? {
>>        if ("the".equals (new String(term.buffer())) {
>>          att.setIsAnEnglishArticle(**true);
>>          return true;
>>      }
>>      return false;
>>    }
>>
>> }
>>
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org