You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Itai Peleg <pe...@gmail.com> on 2013/01/01 22:24:41 UTC
Re: adding attributes to TokenStream
That worked great :) thanks a lot for the quick reply!
I have another question - after I "flagged" all my special tokens (in my
case, the ones that are entities) is there an elegant way of counting how
many of them I have in a document? I found an ugly way to do that, but I'm
sure there's a better one.
Thanks in advance,
Itai
2012/12/31 Michael Sokolov <so...@ifactory.com>
> On 12/31/2012 11:39 AM, Itai Peleg wrote:
>
>> Hi all,
>>
>> Can someone please post a simple example showing how to add additional
>> attributes to token in a TokenStream (inside IncrementToken for example?).
>>
>> I'm working on entity extraction and want to flag specific tokens an
>> entities, but I'm having problems.
>>
>> Thanks in advance,
>> Itai
>>
>> Here's a simple example of a filter that adds an atytribute saying
> whether a token is "the"
>
> class YourTokenStream extends TokenFilter {
> private final YourAttribute att;
> private final CharTermAttribute term;
> private final TokenStream source;
>
> public YourTokenStream (TokenStream upstream) {
> att = addAttribute (YourAttribute.class);
> term = addAttribute (CharTermAttribute.class);
> source = upstream;
> }
>
> public boolean incrementToken () {
> if (source.incrementToken()) ?? {
> if ("the".equals (new String(term.buffer())) {
> att.setIsAnEnglishArticle(**true);
> return true;
> }
> return false;
> }
>
> }
>
>
>
Re: adding attributes to TokenStream
Posted by Michael Sokolov <so...@ifactory.com>.
Sure ... The frequency count is maintained in the index to enable
relevance scoring. You can pull it out using a TermDocs, which
enumerates this sort of information. Sorry, I don't have example code
handy for this.
-Mike
On 1/1/2013 4:24 PM, Itai Peleg wrote:
> That worked great :) thanks a lot for the quick reply!
>
> I have another question - after I "flagged" all my special tokens (in my
> case, the ones that are entities) is there an elegant way of counting how
> many of them I have in a document? I found an ugly way to do that, but I'm
> sure there's a better one.
>
> Thanks in advance,
> Itai
>
>
> 2012/12/31 Michael Sokolov <so...@ifactory.com>
>
>> On 12/31/2012 11:39 AM, Itai Peleg wrote:
>>
>>> Hi all,
>>>
>>> Can someone please post a simple example showing how to add additional
>>> attributes to token in a TokenStream (inside IncrementToken for example?).
>>>
>>> I'm working on entity extraction and want to flag specific tokens an
>>> entities, but I'm having problems.
>>>
>>> Thanks in advance,
>>> Itai
>>>
>>> Here's a simple example of a filter that adds an atytribute saying
>> whether a token is "the"
>>
>> class YourTokenStream extends TokenFilter {
>> private final YourAttribute att;
>> private final CharTermAttribute term;
>> private final TokenStream source;
>>
>> public YourTokenStream (TokenStream upstream) {
>> att = addAttribute (YourAttribute.class);
>> term = addAttribute (CharTermAttribute.class);
>> source = upstream;
>> }
>>
>> public boolean incrementToken () {
>> if (source.incrementToken()) ?? {
>> if ("the".equals (new String(term.buffer())) {
>> att.setIsAnEnglishArticle(**true);
>> return true;
>> }
>> return false;
>> }
>>
>> }
>>
>>
>>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org