You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Doron Cohen <cd...@gmail.com> on 2007/12/27 00:20:42 UTC
SinkTokenizer: next(Token) vs. next()
Working on Lucene-1101 I checked if SinkTokenizer.next(Token) should also
call Token.clear(). (It shouldn't, because it ignores the input token.)
However I think that calls to next() would end up creating Tokens for
nothing (by TokenStream.next()).
May currently be an empty case (if all current uses call next(Token)), but
still - is it safer for SinkTokenizer to implement next() rather than
next(Token)?
Re: SinkTokenizer: next(Token) vs. next()
Posted by Doron Cohen <cd...@gmail.com>.
On Dec 28, 2007 3:54 PM, Yonik Seeley <yo...@apache.org> wrote:
> On Dec 28, 2007 8:43 AM, Doron Cohen <cd...@gmail.com> wrote:
> > >
> > > > a TS must implement one of them. I see no harm in implementing
> > > > the two (but doing so is likely to just duplicate TokenStream's
> code.)
> > >
> > > I don't think the contract was ever laid out so strictly. I think
> > > it's fine for any TokenStream to implement both if it's advantageous
> > > to do so.
> > >
> >
> > From TokenStream's Javadocs:
> > "subclasses must override at least one of next() or next(Token)."
>
> Meaning that it's also fine to override both. We are agreeing, right?
Yes :-)
Doron
Re: SinkTokenizer: next(Token) vs. next()
Posted by Yonik Seeley <yo...@apache.org>.
On Dec 28, 2007 8:43 AM, Doron Cohen <cd...@gmail.com> wrote:
> >
> > > a TS must implement one of them. I see no harm in implementing
> > > the two (but doing so is likely to just duplicate TokenStream's code.)
> >
> > I don't think the contract was ever laid out so strictly. I think
> > it's fine for any TokenStream to implement both if it's advantageous
> > to do so.
> >
>
> From TokenStream's Javadocs:
> "subclasses must override at least one of next() or next(Token)."
Meaning that it's also fine to override both. We are agreeing, right?
-Yonik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: SinkTokenizer: next(Token) vs. next()
Posted by Doron Cohen <cd...@gmail.com>.
On Dec 28, 2007 4:10 PM, Grant Ingersoll <gs...@apache.org> wrote:
> I'm fine w/ making this change. No sense in implementing both as we
> can just rely on next(Token) to call next(). I will commit the change
> and put a comment on the issue that created the SinkTokenizer.
>
Cool thanks!
Re: SinkTokenizer: next(Token) vs. next()
Posted by Grant Ingersoll <gs...@apache.org>.
I'm fine w/ making this change. No sense in implementing both as we
can just rely on next(Token) to call next(). I will commit the change
and put a comment on the issue that created the SinkTokenizer.
-Grant
On Dec 28, 2007, at 8:43 AM, Doron Cohen wrote:
>>
>>> a TS must implement one of them. I see no harm in implementing
>>> the two (but doing so is likely to just duplicate TokenStream's
>>> code.)
>>
>> I don't think the contract was ever laid out so strictly. I think
>> it's fine for any TokenStream to implement both if it's advantageous
>> to do so.
>>
>
> From TokenStream's Javadocs:
> "subclasses must override at least one of next() or next(Token)."
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: SinkTokenizer: next(Token) vs. next()
Posted by Doron Cohen <cd...@gmail.com>.
>
> > a TS must implement one of them. I see no harm in implementing
> > the two (but doing so is likely to just duplicate TokenStream's code.)
>
> I don't think the contract was ever laid out so strictly. I think
> it's fine for any TokenStream to implement both if it's advantageous
> to do so.
>
>From TokenStream's Javadocs:
"subclasses must override at least one of next() or next(Token)."
Re: SinkTokenizer: next(Token) vs. next()
Posted by Yonik Seeley <yo...@apache.org>.
On Dec 28, 2007 8:20 AM, Doron Cohen <cd...@gmail.com> wrote:
> The "contract" of the two next methods as I understand it is that
> a TS must implement one of them. I see no harm in implementing
> the two (but doing so is likely to just duplicate TokenStream's code.)
I don't think the contract was ever laid out so strictly. I think
it's fine for any TokenStream to implement both if it's advantageous
to do so.
> For SinkTokenizer it actually implements next with no reuse logic,
> so it really should implement just next(). Then, if any consumer
> of SinkTokenizer calls next(Token), the default impl of this method
> in TokenStream would call SinkTokenizers' next().
>
> Do you agree with this?
A agree. The current implementation is sub-optimal if the caller uses next()
-Yonik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: SinkTokenizer: next(Token) vs. next()
Posted by Doron Cohen <cd...@gmail.com>.
Hi Grant,
"safer" was not the best wording, sorry for that - I meant performance
wise, there's no correctness issue.
The "contract" of the two next methods as I understand it is that
a TS must implement one of them. I see no harm in implementing
the two (but doing so is likely to just duplicate TokenStream's code.)
For SinkTokenizer it actually implements next with no reuse logic,
so it really should implement just next(). Then, if any consumer
of SinkTokenizer calls next(Token), the default impl of this method
in TokenStream would call SinkTokenizers' next().
Do you agree with this?
Cheers,
Doron
On Dec 27, 2007 4:20 PM, Grant Ingersoll <gs...@apache.org> wrote:
>
> On Dec 26, 2007, at 6:20 PM, Doron Cohen wrote:
>
> > Working on Lucene-1101 I checked if SinkTokenizer.next(Token) should
> > also
> > call Token.clear(). (It shouldn't, because it ignores the input
> > token.)
> >
> > However I think that calls to next() would end up creating Tokens for
> > nothing (by TokenStream.next()).
> >
> > May currently be an empty case (if all current uses call
> > next(Token)), but
> > still - is it safer for SinkTokenizer to implement next() rather than
> > next(Token)?
>
> I'm still a bit fuzzy on the interplay of these myself, but what makes
> the call of SinkTokenizer.next(Token) unsafe or is it just the
> potential of Tokens being created? I guess SinkTokenizer could just
> override both methods.
>
> -Grant
>
Re: SinkTokenizer: next(Token) vs. next()
Posted by Grant Ingersoll <gs...@apache.org>.
On Dec 26, 2007, at 6:20 PM, Doron Cohen wrote:
> Working on Lucene-1101 I checked if SinkTokenizer.next(Token) should
> also
> call Token.clear(). (It shouldn't, because it ignores the input
> token.)
>
> However I think that calls to next() would end up creating Tokens for
> nothing (by TokenStream.next()).
>
> May currently be an empty case (if all current uses call
> next(Token)), but
> still - is it safer for SinkTokenizer to implement next() rather than
> next(Token)?
I'm still a bit fuzzy on the interplay of these myself, but what makes
the call of SinkTokenizer.next(Token) unsafe or is it just the
potential of Tokens being created? I guess SinkTokenizer could just
override both methods.
-Grant
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org