You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Nitzan Shaked <ni...@gmail.com> on 2014/05/16 06:28:11 UTC

BaseTokenStreamTestCase

Hi all

While writing the unit tests for a new token filter I came across an
issue(?) with BaseTokenStreamTestCase.assertTokenStreamContents(): it goes
to some length to assure that clearAttributes() was called for every token
produced by the filter under test.

I suppose this helps most of the time, but my filter produces sometimes
more than 1 output token for a given input token. I don't want to care
about what attributes the input token carries, and so don't clear
attributes between producing the output tokens from a given input token: I
only change the attributes I care about (in my case this is charTerm right
now, and nothing else, not even positionIncrement).

This makes my unit tests unable to use
BaseTokenStreamTestCase.assertTokenStreamContents(). I certainly do not
want to add a captureState() and "clearAttributes() ; restoreState() "
calls just so I can pass the unit tests.

I would rather change assertTokenStreamContents to support my use case, by
adding a boolean and making the required changes everywhere else.

Thoughts?
Nitzan

Re: BaseTokenStreamTestCase

Posted by Robert Muir <rc...@gmail.com>.
its not really a use case: you have to clear attributes when creating
a new token or you will have dirty state that is not appropriate...

On Fri, May 16, 2014 at 12:28 AM, Nitzan Shaked <ni...@gmail.com> wrote:
> Hi all
>
> While writing the unit tests for a new token filter I came across an
> issue(?) with BaseTokenStreamTestCase.assertTokenStreamContents(): it goes
> to some length to assure that clearAttributes() was called for every token
> produced by the filter under test.
>
> I suppose this helps most of the time, but my filter produces sometimes more
> than 1 output token for a given input token. I don't want to care about what
> attributes the input token carries, and so don't clear attributes between
> producing the output tokens from a given input token: I only change the
> attributes I care about (in my case this is charTerm right now, and nothing
> else, not even positionIncrement).
>
> This makes my unit tests unable to use
> BaseTokenStreamTestCase.assertTokenStreamContents(). I certainly do not want
> to add a captureState() and "clearAttributes() ; restoreState() " calls just
> so I can pass the unit tests.
>
> I would rather change assertTokenStreamContents to support my use case, by
> adding a boolean and making the required changes everywhere else.
>
> Thoughts?
> Nitzan
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: BaseTokenStreamTestCase

Posted by Nitzan Shaked <ni...@gmail.com>.
Got it. Will do so, and amend my JIRA ticket to include this as well as
tests.

Thanks!


On Sat, May 17, 2014 at 2:21 AM, Uwe Schindler <uw...@thetaphi.de> wrote:

> Hi,
>
>
>
> you have to capture state on the first token before inserting new ones.
> When inserting a new token, **solely** call restoreState();
> clearAttributes() is not needed before restoreState().
>
> If you don’t do this, your filter will work incorrect if other filters
> come **after** it.
>
>
>
> The assertion in BaseTokenStreamTestCase is therefore correct and really
> mandatory. There are many filters that show how to do this token inserting
> correctly.
>
>
>
> Uwe
>
>
>
> -----
>
> Uwe Schindler
>
> H.-H.-Meier-Allee 63, D-28213 Bremen
>
> http://www.thetaphi.de
>
> eMail: uwe@thetaphi.de
>
>
>
> *From:* Nitzan Shaked [mailto:nitzan.shaked@gmail.com]
> *Sent:* Friday, May 16, 2014 6:28 AM
> *To:* dev@lucene.apache.org
> *Subject:* BaseTokenStreamTestCase
>
>
>
> Hi all
>
>
>
> While writing the unit tests for a new token filter I came across an
> issue(?) with BaseTokenStreamTestCase.assertTokenStreamContents(): it goes
> to some length to assure that clearAttributes() was called for every token
> produced by the filter under test.
>
>
>
> I suppose this helps most of the time, but my filter produces sometimes
> more than 1 output token for a given input token. I don't want to care
> about what attributes the input token carries, and so don't clear
> attributes between producing the output tokens from a given input token: I
> only change the attributes I care about (in my case this is charTerm right
> now, and nothing else, not even positionIncrement).
>
>
>
> This makes my unit tests unable to use
> BaseTokenStreamTestCase.assertTokenStreamContents(). I certainly do not
> want to add a captureState() and "clearAttributes() ; restoreState() "
> calls just so I can pass the unit tests.
>
>
>
> I would rather change assertTokenStreamContents to support my use case, by
> adding a boolean and making the required changes everywhere else.
>
>
>
> Thoughts?
>
> Nitzan
>
>
>

RE: BaseTokenStreamTestCase

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

 

you have to capture state on the first token before inserting new ones. When inserting a new token, *solely* call restoreState(); clearAttributes() is not needed before restoreState().

If you don’t do this, your filter will work incorrect if other filters come *after* it.

 

The assertion in BaseTokenStreamTestCase is therefore correct and really mandatory. There are many filters that show how to do this token inserting correctly.

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de <http://www.thetaphi.de/> 

eMail: uwe@thetaphi.de

 

From: Nitzan Shaked [mailto:nitzan.shaked@gmail.com] 
Sent: Friday, May 16, 2014 6:28 AM
To: dev@lucene.apache.org
Subject: BaseTokenStreamTestCase

 

Hi all

 

While writing the unit tests for a new token filter I came across an issue(?) with BaseTokenStreamTestCase.assertTokenStreamContents(): it goes to some length to assure that clearAttributes() was called for every token produced by the filter under test.

 

I suppose this helps most of the time, but my filter produces sometimes more than 1 output token for a given input token. I don't want to care about what attributes the input token carries, and so don't clear attributes between producing the output tokens from a given input token: I only change the attributes I care about (in my case this is charTerm right now, and nothing else, not even positionIncrement).

 

This makes my unit tests unable to use BaseTokenStreamTestCase.assertTokenStreamContents(). I certainly do not want to add a captureState() and "clearAttributes() ; restoreState() " calls just so I can pass the unit tests.

 

I would rather change assertTokenStreamContents to support my use case, by adding a boolean and making the required changes everywhere else.

 

Thoughts?

Nitzan