You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2009/09/25 16:43:16 UTC

[jira] Created: (LUCENE-1926) Back compat break with old next() consumer API

Back compat break with old next() consumer API
----------------------------------------------

                 Key: LUCENE-1926
                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
             Project: Lucene - Java
          Issue Type: Bug
          Components: Analysis
    Affects Versions: 2.9
            Reporter: Robert Muir
         Attachments: CaptureStateTestcase.java

There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.

I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.

I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
If I consume with incrementToken(), things work. 

{code}
State tempState = captureState(); // after we capture state here, things get strange.
String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
{code}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759582#action_12759582 ] 

Uwe Schindler commented on LUCENE-1926:
---------------------------------------

bq. One side note I worry about a bit now, is the possibility of similar bugs might exist or crop up somewhere like shingle... but the tests might pass and they appear to be working

Shingle was reviewed and changed by me, I think this one is OK. The problem you described could have been catched by backwards tests, but these are only running for core, not contrib.

> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759557#action_12759557 ] 

Uwe Schindler edited comment on LUCENE-1926 at 9/25/09 8:06 AM:
----------------------------------------------------------------

That's exactly the case. You should also capture the state in "case 1:". The attributes API does not guarantee, that the attributes are preserved between calls to incrementToken (the same like the reusable TokenAPI is not forced to always use the same reusable token). If you do not reuse tokens, this is exactly the case (The Token instance in the wrapper is replaced), so the attribute contents gets lost (empty token instance). One could fix this ba an extra token cloning, but even with the old API (next(Token) it would never have been worked. Because of this, all Tokenizer *should* call clearAttributes() first to have a new start.

I am not sure, if it worked correctly before LUCENE-1919.

ADDENDUM:
You should never rely on attributes preserved between calls. If you plug another TokenFilter on top of your filter, this filter could change the tokens. The Tokens are currently only preserved 100% if you only use incrementToken() and your filter/Tokenizer is the only one modifying the tokens. You can never guarantee that.

This issue is won't fix, as exspected behaviour. Ok with that?

      was (Author: thetaphi):
    That's exactly the case. You should also capture the state in "case 1:". The attributes API does not guarantee, that the attributes are preserved between calls to incrementToken (the same like the reusable TokenAPI is not forced to always use the same reusable token). If you do not reuse tokens, this is exactly the case (The Token instance in the wrapper is replaced), so the attribute contents gets lost (empty token instance). One could fix this ba an extra token cloning, but even with the old API (next(Token) it would never have been worked. Because of this, all Tokenizer *should* call clearAttributes() first.

I am not sure, if it worked correctly before LUCENE-1919.
  
> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759605#action_12759605 ] 

Uwe Schindler commented on LUCENE-1926:
---------------------------------------

I checked: All TokenStreams in core/contrib pass the tests with a separate clearAttributes() before each call to incrementToken().

> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java, LUCENE-1926.patch
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759613#action_12759613 ] 

Uwe Schindler commented on LUCENE-1926:
---------------------------------------

I do not think this is needed. clearAttributes() should be enough.

> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java, LUCENE-1926.patch
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759610#action_12759610 ] 

Robert Muir commented on LUCENE-1926:
-------------------------------------

Uwe, I think its a great idea to prevent future problems.

The only thing i could add, maybe overkill, would be to actually zero out the term buffer in addition to clearAttributes() in the base test case.
This might seem absurd, but I could have cached .termLength(), clearAttributes() only sets the length to zero, and a few analyzer tests only test for term text...
In that case it might have still slipped by...

> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java, LUCENE-1926.patch
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759585#action_12759585 ] 

Robert Muir commented on LUCENE-1926:
-------------------------------------

{quote}
Shingle was reviewed and changed by me, I think this one is OK. The problem you described could have been catched by backwards tests, but these are only running for core, not contrib.
{quote}

Uwe, again I apologize, thanks for explaining it to me. I thought I had found something when i saw different results from incrementToken versus next, but clearly all I found was a bug in my code :)

> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java, LUCENE-1926.patch
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759602#action_12759602 ] 

Uwe Schindler commented on LUCENE-1926:
---------------------------------------

This is also a good isea, even better. It should simply call clearAttributes() before each incrementToken(). A real consumer would not do this for speed reasons, but the test.

bq. Right but maybe you could have implemented back compat differently, where it would appear to work with next() also. Or maybe at some point next() will go away?

As you said before, somebody else could also modify the attributes, not only the backwards layer. For speed reasons: Preventing this would add an extra clone or somehow other copy of the attribute.

next() and next(Token) will go away the next weeks...

> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java, LUCENE-1926.patch
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759588#action_12759588 ] 

Robert Muir commented on LUCENE-1926:
-------------------------------------

bq. You did not waste my time, it was more my health. I got a heart attack when I read "Back compat break in old next()..." 

I had to grab your attention since I couldn't figure it out :) 
If you want, you should change the title to "Bug in Robert Muir's naive code"... I deserve it for giving you heart failures.
The javadoc patch you uploaded might help to prevent someone from creating similar bugs in the future.


> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java, LUCENE-1926.patch
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759589#action_12759589 ] 

Mark Miller commented on LUCENE-1926:
-------------------------------------

bq. You did not waste my time, it was more my health. I got a heart attack when I read "Back compat break in old next()..." 

You weren't the only one :)

> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java, LUCENE-1926.patch
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759606#action_12759606 ] 

Robert Muir commented on LUCENE-1926:
-------------------------------------

bq. It should simply call clearAttributes() before each incrementToken(). 

my thoughts too, this causes my test to fail with incrementToken, exposing the bug.

I will update your patch with this one-liner once i let ant test finish, just to make sure it doesnt break the build and there arent any similar bugs somewhere in contrib.


> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java, LUCENE-1926.patch
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-1926:
--------------------------------

    Attachment: CaptureStateTestcase.java

> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759647#action_12759647 ] 

Uwe Schindler commented on LUCENE-1926:
---------------------------------------

Committed improved test, rev 818920

> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java, LUCENE-1926.patch
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759566#action_12759566 ] 

Yonik Seeley commented on LUCENE-1926:
--------------------------------------

Yes, I think calling captureState() before incrementToken() doesn't make sense (as case:2 does) since the state would seem to be undefined at that point?

> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759572#action_12759572 ] 

Uwe Schindler commented on LUCENE-1926:
---------------------------------------

I think this info from next(Token) javadocs also applies to incrementToken():
{code}
   * Also, the producer must make no assumptions about a {@link Token} after it
   * has been returned: the caller may arbitrarily change it. If the producer
   * needs to hold onto the {@link Token} for subsequent calls, it must clone()
   * it before storing it. Note that a {@link TokenFilter} is considered a
   * consumer.
{code}

> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved LUCENE-1926.
---------------------------------

    Resolution: Won't Fix

this behavior really was not guaranteed as explained by Uwe... sorry to waste your time with this :)

> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759599#action_12759599 ] 

Robert Muir commented on LUCENE-1926:
-------------------------------------

{quote}
We could check the filters again by changing assertAnalyzesTo to consume the stream three times with all three APIs 
{quote}

Right but maybe you could have implemented back compat differently, where it would appear to work with next() also.
Or maybe at some point next() will go away?
Still as you said, there's a bug because something else could modify these attributes.
Maybe instead, in assertTokenStreamContents, after asserting the value is correct, it could do something like "zero out" the values?

This would probably detect bugs like this.

> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java, LUCENE-1926.patch
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759606#action_12759606 ] 

Robert Muir edited comment on LUCENE-1926 at 9/25/09 9:18 AM:
--------------------------------------------------------------

bq. It should simply call clearAttributes() before each incrementToken(). 

my thoughts too, this causes my test to fail with incrementToken, exposing the bug.

I will update your patch with this one-liner once i let ant test finish, just to make sure it doesnt break the build and there arent any similar bugs somewhere in contrib.

edit: nevermind, your computer is much faster than mine.

      was (Author: rcmuir):
    bq. It should simply call clearAttributes() before each incrementToken(). 

my thoughts too, this causes my test to fail with incrementToken, exposing the bug.

I will update your patch with this one-liner once i let ant test finish, just to make sure it doesnt break the build and there arent any similar bugs somewhere in contrib.

  
> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java, LUCENE-1926.patch
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759567#action_12759567 ] 

Robert Muir commented on LUCENE-1926:
-------------------------------------

bq. This issue is won't fix, as exspected behaviour. Ok with that?

Uwe, based upon what you said (additional filters could modify the tokens), I tend to agree with you, but its wierd it only happens with next() consumer api.
It does work with next(Token)

I still think its also not very obvious, and wierd to see inconsistencies depending upon how things are consumed.




> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759586#action_12759586 ] 

Uwe Schindler commented on LUCENE-1926:
---------------------------------------

bq. this behavior really was not guaranteed as explained by Uwe... sorry to waste your time with this  :)

You did not waste my time, it was more my health. I got a heart attack when I read "Back compat break in old next()..." :)

> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java, LUCENE-1926.patch
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759568#action_12759568 ] 

Uwe Schindler commented on LUCENE-1926:
---------------------------------------

bq. btw, my TestCase uses StandardTokenizer, which does call clearAttributes().

I have seen this, because of that the attrs between step 0 and 1 are cleared. As you do not call incrementToken in the underlying filter in step 2, it seems to be preserved (in fact, you are the source of tokens and should call clearAttributes() for this step).

The problem with preserving the attribute state between calls to incrementToken is e.g. the following even with incrementToken():

Just put an ReverseTokenFilter on top of this TokenFilter. This tokenfilter reverses the term. If you only consume with incrementToken() and rely on the fact that the tokens from the last call are preserved, you fail: The Token is reversed by the reverse filter and then step 2 would then see the reversed term text and not the forward one exspected from step 1.

If you want to preserve states between incrementToken calls, you have to capture the state. Maybe the Javadocs should be extended, to clearly note, that attribute contents (may) not preserved between calls to incrementToken().

> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759553#action_12759553 ] 

Robert Muir commented on LUCENE-1926:
-------------------------------------

i looked into this some, and it appears the problem isn't due to captureState(), but instead what is happening is my termAttribute is getting erased even before then.

I suspect this might be linked to the changes in LUCENE-1919


> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759609#action_12759609 ] 

Uwe Schindler commented on LUCENE-1926:
---------------------------------------

bq. edit: nevermind, your computer is much faster than mine.

Not really, I used "ant test -Dtestpackage=analysis"

I will commit this addition to assertTokenStreamContents, soon (the javadoc fix is already committed)

> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java, LUCENE-1926.patch
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759557#action_12759557 ] 

Uwe Schindler commented on LUCENE-1926:
---------------------------------------

That's exactly the case. You should also capture the state in "case 1:". The attributes API does not guarantee, that the attributes are preserved between calls to incrementToken (the same like the reusable TokenAPI is not forced to always use the same reusable token). If you do not reuse tokens, this is exactly the case (The Token instance in the wrapper is replaced), so the attribute contents gets lost (empty token instance). One could fix this ba an extra token cloning, but even with the old API (next(Token) it would never have been worked. Because of this, all Tokenizer *should* call clearAttributes() first.

I am not sure, if it worked correctly before LUCENE-1919.

> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759570#action_12759570 ] 

Robert Muir commented on LUCENE-1926:
-------------------------------------

bq. Yes, I think calling captureState() before incrementToken() doesn't make sense (as case:2 does) since the state would seem to be undefined at that point?

This is because its out of context (I had to narrow the test down).
The idea was that in case 2, i wanted to capture the unchanged TermAttribute from case 1 (since i felt if i didnt call incrementToken, it would not be changed)


> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759575#action_12759575 ] 

Robert Muir commented on LUCENE-1926:
-------------------------------------

Uwe, what you are saying is true (its really a bug in my filter and I agree you should cancel this as won't fix, any javadoc clarification might prevent someone else from doing this).

One side note I worry about a bit now, is the possibility of similar bugs might exist or crop up somewhere like shingle... but the tests might pass and they appear to be working


> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759571#action_12759571 ] 

Robert Muir commented on LUCENE-1926:
-------------------------------------

bq. Maybe the Javadocs should be extended, to clearly note, that attribute contents (may) not preserved between calls to incrementToken().

Uwe, yes. I expect that if I added a stemmer, or reversetokenfilter, or something it would modify my termAttribute.
What i didnt expect is that the back compat layer would modify my termAttribute.

> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759563#action_12759563 ] 

Robert Muir commented on LUCENE-1926:
-------------------------------------

{quote}
The attributes API does not guarantee, that the attributes are preserved between calls to incrementToken
{quote}

Uwe, perhaps this is my understanding then. Its not obvious from the documentation that incrementToken will erase my attributes.

TokenStream now extends AttributeSource, which provides
 access to all of the token Attributes for the TokenStream.
 Note that only one instance per AttributeImpl is created and reused
 for every token. This approach reduces object creation and allows local
 caching of references to the AttributeImpls.

What else is "local caching of references to the AttributeImpls" supposed to mean?

btw, my TestCase uses StandardTokenizer, which does call clearAttributes().


> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-1926:
----------------------------------

    Attachment: LUCENE-1926.patch

This is an addition to javadocs (just copied from next(Token)).

> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java, LUCENE-1926.patch
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759573#action_12759573 ] 

Uwe Schindler commented on LUCENE-1926:
---------------------------------------

bq. Uwe, yes. I expect that if I added a stemmer, or reversetokenfilter, or something it would modify my termAttribute. What i didnt expect is that the back compat layer would modify my termAttribute.

OK, but this was the same with next(Token) (see above). You could not rely on the fact that the reusableToken is preserved, it could even be changed by the consumer or whatever.

You can implement you TokenFilter with next(reusableToken) and will have the same problems if you rely on the reusableToken is preserved from the last call.

> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1926) Back compat break with old next() consumer API

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759594#action_12759594 ] 

Uwe Schindler commented on LUCENE-1926:
---------------------------------------

bq. One side note I worry about a bit now, is the possibility of similar bugs might exist or crop up somewhere like shingle... but the tests might pass and they appear to be working

We could check the filters again by changing assertAnalyzesTo to consume the stream three times with all three APIs :-)

> Back compat break with old next() consumer API
> ----------------------------------------------
>
>                 Key: LUCENE-1926
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1926
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Robert Muir
>         Attachments: CaptureStateTestcase.java, LUCENE-1926.patch
>
>
> There is a bug that causes tokenstreams to return different results, depending upon whether they are consumed with the incrementToken() api or the next() api.
> I found this because the Solr analysis tool in the admin page uses the next() api, and i was seeing strange results.
> I've created a test case to show the problem. when calling captureState(),  the current state is erased, but only when consuming with the next() api.
> If I consume with incrementToken(), things work. 
> {code}
> State tempState = captureState(); // after we capture state here, things get strange.
> String right = termAtt.term(); // when using old consumer API, this value is wrong!!!!
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org