You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Doron Cohen <cd...@gmail.com> on 2007/12/27 00:20:42 UTC

SinkTokenizer: next(Token) vs. next()

Working on Lucene-1101 I checked if SinkTokenizer.next(Token) should also
call Token.clear(). (It shouldn't, because it ignores the input token.)

However I think that calls to next() would end up creating Tokens for
nothing (by TokenStream.next()).

May currently be an empty case (if all current uses call next(Token)), but
still - is it safer for SinkTokenizer to implement next() rather than
next(Token)?

Re: SinkTokenizer: next(Token) vs. next()

Posted by Doron Cohen <cd...@gmail.com>.
On Dec 28, 2007 3:54 PM, Yonik Seeley <yo...@apache.org> wrote:

> On Dec 28, 2007 8:43 AM, Doron Cohen <cd...@gmail.com> wrote:
> > >
> > > > a TS must implement one of them. I see no harm in implementing
> > > > the two (but doing so is likely to just duplicate TokenStream's
> code.)
> > >
> > > I don't think the contract was ever laid out so strictly.  I think
> > > it's fine for any TokenStream to implement both if it's advantageous
> > > to do so.
> > >
> >
> > From TokenStream's Javadocs:
> >   "subclasses must override at least one of next() or next(Token)."
>
> Meaning that it's also fine to override both.  We are agreeing, right?


Yes :-)

Doron

Re: SinkTokenizer: next(Token) vs. next()

Posted by Yonik Seeley <yo...@apache.org>.
On Dec 28, 2007 8:43 AM, Doron Cohen <cd...@gmail.com> wrote:
> >
> > > a TS must implement one of them. I see no harm in implementing
> > > the two (but doing so is likely to just duplicate TokenStream's code.)
> >
> > I don't think the contract was ever laid out so strictly.  I think
> > it's fine for any TokenStream to implement both if it's advantageous
> > to do so.
> >
>
> From TokenStream's Javadocs:
>   "subclasses must override at least one of next() or next(Token)."

Meaning that it's also fine to override both.  We are agreeing, right?

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: SinkTokenizer: next(Token) vs. next()

Posted by Doron Cohen <cd...@gmail.com>.
On Dec 28, 2007 4:10 PM, Grant Ingersoll <gs...@apache.org> wrote:

> I'm fine w/ making this change.  No sense in implementing both as we
> can just rely on next(Token) to call next().  I will commit the change
> and put a comment on the issue that created the SinkTokenizer.
>

Cool thanks!

Re: SinkTokenizer: next(Token) vs. next()

Posted by Grant Ingersoll <gs...@apache.org>.
I'm fine w/ making this change.  No sense in implementing both as we  
can just rely on next(Token) to call next().  I will commit the change  
and put a comment on the issue that created the SinkTokenizer.


-Grant

On Dec 28, 2007, at 8:43 AM, Doron Cohen wrote:

>>
>>> a TS must implement one of them. I see no harm in implementing
>>> the two (but doing so is likely to just duplicate TokenStream's  
>>> code.)
>>
>> I don't think the contract was ever laid out so strictly.  I think
>> it's fine for any TokenStream to implement both if it's advantageous
>> to do so.
>>
>
> From TokenStream's Javadocs:
>  "subclasses must override at least one of next() or next(Token)."



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: SinkTokenizer: next(Token) vs. next()

Posted by Doron Cohen <cd...@gmail.com>.
>
> > a TS must implement one of them. I see no harm in implementing
> > the two (but doing so is likely to just duplicate TokenStream's code.)
>
> I don't think the contract was ever laid out so strictly.  I think
> it's fine for any TokenStream to implement both if it's advantageous
> to do so.
>

>From TokenStream's Javadocs:
  "subclasses must override at least one of next() or next(Token)."

Re: SinkTokenizer: next(Token) vs. next()

Posted by Yonik Seeley <yo...@apache.org>.
On Dec 28, 2007 8:20 AM, Doron Cohen <cd...@gmail.com> wrote:
> The "contract" of the two next methods as I understand it is that
> a TS must implement one of them. I see no harm in implementing
> the two (but doing so is likely to just duplicate TokenStream's code.)

I don't think the contract was ever laid out so strictly.  I think
it's fine for any TokenStream to implement both if it's advantageous
to do so.

> For SinkTokenizer it actually implements next with no reuse logic,
> so it really should implement just next(). Then, if any consumer
> of SinkTokenizer calls next(Token), the default impl of this method
> in TokenStream would call SinkTokenizers' next().
>
> Do you agree with this?

A agree.  The current implementation is sub-optimal if the caller uses next()

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: SinkTokenizer: next(Token) vs. next()

Posted by Doron Cohen <cd...@gmail.com>.
Hi Grant,

"safer" was not the best wording, sorry for that - I meant performance
wise, there's no correctness issue.

The "contract" of the two next methods as I understand it is that
a TS must implement one of them. I see no harm in implementing
the two (but doing so is likely to just duplicate TokenStream's code.)

For SinkTokenizer it actually implements next with no reuse logic,
so it really should implement just next(). Then, if any consumer
of SinkTokenizer calls next(Token), the default impl of this method
in TokenStream would call SinkTokenizers' next().

Do you agree with this?

Cheers,
Doron

On Dec 27, 2007 4:20 PM, Grant Ingersoll <gs...@apache.org> wrote:

>
> On Dec 26, 2007, at 6:20 PM, Doron Cohen wrote:
>
> > Working on Lucene-1101 I checked if SinkTokenizer.next(Token) should
> > also
> > call Token.clear(). (It shouldn't, because it ignores the input
> > token.)
> >
> > However I think that calls to next() would end up creating Tokens for
> > nothing (by TokenStream.next()).
> >
> > May currently be an empty case (if all current uses call
> > next(Token)), but
> > still - is it safer for SinkTokenizer to implement next() rather than
> > next(Token)?
>
> I'm still a bit fuzzy on the interplay of these myself, but what makes
> the call of SinkTokenizer.next(Token) unsafe or is it just the
> potential of Tokens being created?  I guess SinkTokenizer could just
> override both methods.
>
> -Grant
>

Re: SinkTokenizer: next(Token) vs. next()

Posted by Grant Ingersoll <gs...@apache.org>.
On Dec 26, 2007, at 6:20 PM, Doron Cohen wrote:

> Working on Lucene-1101 I checked if SinkTokenizer.next(Token) should  
> also
> call Token.clear(). (It shouldn't, because it ignores the input  
> token.)
>
> However I think that calls to next() would end up creating Tokens for
> nothing (by TokenStream.next()).
>
> May currently be an empty case (if all current uses call  
> next(Token)), but
> still - is it safer for SinkTokenizer to implement next() rather than
> next(Token)?

I'm still a bit fuzzy on the interplay of these myself, but what makes  
the call of SinkTokenizer.next(Token) unsafe or is it just the  
potential of Tokens being created?  I guess SinkTokenizer could just  
override both methods.

-Grant



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org