You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2009/09/18 02:01:57 UTC

[jira] Created: (LUCENE-1919) Analysis back compat break

Analysis back compat break
--------------------------

                 Key: LUCENE-1919
                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
             Project: Lucene - Java
          Issue Type: Bug
            Reporter: Yonik Seeley
             Fix For: 2.9


Old and new style token streams don't mix well.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1919) Analysis back compat break

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756870#action_12756870 ] 

Yonik Seeley commented on LUCENE-1919:
--------------------------------------

Background:
http://search.lucidimagination.com/search/document/4b2b4210e2516769/analysis_back_compat_break
http://search.lucidimagination.com/search/document/26c044ecbce3ed29

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 2.9
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1919) Analysis back compat break

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756903#action_12756903 ] 

Jason Rutherglen commented on LUCENE-1919:
------------------------------------------

With SOLR-908 CommonGramsQueryFilter which uses the old API,
we've been seeing since we upgraded to Solr 1.4/Lucene 2.9,
random negations to query clauses. It almost looks like there's
some sort of shared state or multithreading issue, however I've
also thought somehow it's related to mixing the old and new
APIs. Unfortunately it's so inconsistent I don't have a test
case that reproduces it (happens in production only). 

Is there any sort of shared state in the analyzing, possibly
between instances that is fixed in this patch?

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1919) Analysis back compat break

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756898#action_12756898 ] 

Yonik Seeley edited comment on LUCENE-1919 at 9/17/09 6:38 PM:
---------------------------------------------------------------

edit: collision w/ robert.
Still wonder if it's safe to get rid of that second clone()... the combinations are mind-bending.

      was (Author: yseeley@gmail.com):
    Robert, you would need to handle the incrementToken() case too in next() - that's actually where the bug occured in the Solr test.

{code}
    if (supportedMethods.hasIncrementToken) {
      tokenWrapper.delegate = new Token();
      return incrementToken() ? ((Token) tokenWrapper.delegate.clone()) : null;
{code}

Could we remove the clone()?  not sure...
  
> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1919) Analysis back compat break

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757092#action_12757092 ] 

Uwe Schindler commented on LUCENE-1919:
---------------------------------------

I would suggest to create a new RC because of this issue. The TokenStream BW stuff is very tricky and is not limited to only one TokenStream. When fixing thigs here, you must always also look into issues that could arise by mixing old/new API. The current issue is a typical example for that. Your brain is always "fuming", when you think about what happens if TF1 calls TF2 using old API, but TF2 is new API and calls TF3 using new API. TF3 itsself is again very old API without reuse and so on.

But this one is new, a reusable TF is calling another TS mixing the APIs, but the changes also affect the other variants. So testing, testing, testing and take a cold shower when your brain starts getting hot :-)

I will commit the current patch later in the afternoon, when you are awake at the west coast.

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1919) Analysis back compat break

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-1919:
--------------------------------

    Attachment: LUCENE-1919.patch

Now give us some better options Uwe :)

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1919) Analysis back compat break

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757007#action_12757007 ] 

Uwe Schindler edited comment on LUCENE-1919 at 9/18/09 12:01 AM:
-----------------------------------------------------------------

Good morning all together! What a nice day and then this problem :-)

Here is my solution for the problem: Michael and me never thought about mixing the old and very old API as consumer, which seems to be used. The problem is now, that the behaviour changed for calling next(). The original 2.4.1 code looks the following:

{code}
public Token next() throws IOException {
  final Token reusableToken = new Token();
  Token nextToken = next(reusableToken);

  if (nextToken != null) {
    Payload p = nextToken.getPayload();
    if (p != null) {
      nextToken.setPayload((Payload) p.clone());
    }
  }

  return nextToken;
}
{code}

The difference is, that a new token is created *before* the call to the reusable next() methods (incrementToken() is somehow reuseable, too).

The attached patch, restores exactly this functionality, but also for incrementToken(). It also removes an unneeded assignment to the delegate in the case of next(Token) delegating to next() (which is also a bug, because it makes the token no longer private for the caller - it can be overridden by a later call to incrementToken()) - sorry for that.

Tokens generated by next() must be always private and not shared or reused as reusableToken for later next/incrementToken calls by the API. This is the problem of Robert's patch. The full private token (which was cloned before) is then also available as delegate for later calls to incrementToken() (for next(Token) the delegate is replaced, as Robert noticed, so no problem here).

The wrapper around incrementToken() does the following:
- save the current delegate
- replace delegate by a new Token (just like the old next() in 2.4.1)
- call incrementToken and assign to nextToken variable
- restore the reusable delegate
- after that all goes like for next(Token)

Simply said: the next() wrapper is completely decoupled and always uses a completely private (new) Token instance.

I am not sure, why this payload cloning code is in 2.4.1, but I moved it here, too. I think it is because of some old bug, where a payload was assigned in next(Token), that was also shared by the TokenStream itsself between more than one tokens. Using this code, the Token is for sure full private (and even not reused later as before).

Using this patch, you should now even be able to mix all three APIs in one filter/consumer - but I still would'nt do this :-)

      was (Author: thetaphi):
    Here is my solution for the problem. Michael and me never thought about mixing the old and very old API, which seems to be used. The problem is now, that the behaviour changed for calling next(). The original 2.4.1 code looks the following:

{code}
public Token next() throws IOException {
  final Token reusableToken = new Token();
  Token nextToken = next(reusableToken);

  if (nextToken != null) {
    Payload p = nextToken.getPayload();
    if (p != null) {
      nextToken.setPayload((Payload) p.clone());
    }
  }

  return nextToken;
}
{code}

The difference is, that a new token is created *before* the call to the reusable next() methods (incrementToken() is somehow reuseable, too).

The attached patch, restores exactly this functionality, but also for incrementToken(). It also removes an unneeded assignment to the deleget in the case of next(Token) delegating to next() (which is also a bug, because it makes the token no longer private for the caller - it can be overridden by a later call to incrementToken()).

Tokens generated by next() must be always private and not shared or reused as reusableToken for later next/incrementToken calls by the API. This is the problem of Robert's patch. The full private token (which was cloned before) is then also available as delegate for later calls to incrementToken() (for next(Token) the delegate is replaced, as Robert noticed, so no problem here).

The wrapper around incrementToken() does the following:
- save the current delegate
- replace delegate by a new Token (just like the old next() in 2.4.1)
- call incrementToken and assign to nextToken variable
- restore the reusable delegate
- after that all goes like for next(Token)

Simply said: the next() wrapper is completely decoupled and always uses a completely private (new) Token instance.

I am not sure, why this payload cloning code is in 2.4.1, but I moved it here, too. I think it is because of some old bug, where a payload was assigned in next(Token), that was also shared by the TokenStream itsself between more than one tokens. Using this code, the Token is for sure full private (and even not reused later as before).

Using this patch, you should now even be able to mix all three APIs in one filter/consumer - but I still would'nt do this :-)
  
> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1919) Analysis back compat break

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757143#action_12757143 ] 

Uwe Schindler commented on LUCENE-1919:
---------------------------------------

OK, will do. Do you know if Yonik also reviewed this patch, because he was the person who reported the bug. Maybe he tests with his failing TS. Also Jason could check out this patch with his problem (SOLR-908).

I will commit in 2 hours, is this early enough?

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1919) Analysis back compat break

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756944#action_12756944 ] 

Robert Muir commented on LUCENE-1919:
-------------------------------------

{quote}
Is there any sort of shared state in the analyzing, possibly
between instances that is fixed in this patch?
{quote}

Yes. if for instance you call foo = ts.next(reusableToken), then call bar = ts.next() 
foo will be overwritten by bar in the second call.
this is because it is incorrectly "reused" in next()... see the testcase i attached for an example.


> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1919) Analysis back compat break

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756883#action_12756883 ] 

Mark Miller commented on LUCENE-1919:
-------------------------------------

At the worst, we can just clone the delegate and not be reusable (the javadoc says you don't have to be reusable)

Not ideal, but it will fix, and cease to be a problem in 3.0 :)

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 2.9
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1919) Analysis back compat break

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756920#action_12756920 ] 

Robert Muir commented on LUCENE-1919:
-------------------------------------

bq. what if the tokenstream only supports next(reusableTS) ?

ok i tested the 2nd scenario and it is ok.
if you want my additional tests, i can add them, but the existing patch is fine... i confused myself worrying about this 2nd case.
in this case when the consumer calls next(reusableTS), no delegate is involved since its overridden... duh :)



> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1919) Analysis back compat break

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756909#action_12756909 ] 

Robert Muir commented on LUCENE-1919:
-------------------------------------

{quote}
edit: collision w/ robert.
Still wonder if it's safe to get rid of that second clone()... the combinations are mind-bending. 
{quote}

yonik, hmm i think the second clone() is a hint there remains another problem
if you look at my patch, it only fixes the case where you have a tokenstream supporting incrementToken(), and you use both next() and next(Token) apis.

what if the tokenstream only supports next(reusableTS) ?
if you call next(token) then next(), i think in that case you will have the same problem.
this still won't introduce any extra cloning, just fix the logic so it doesnt overwrite the tokenWrapper, and returns a "full private copy" like the javadocs say.

 (i'll add another test and upload a new patch)

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1919) Analysis back compat break

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-1919:
--------------------------------

    Attachment: LUCENE-1919.patch

alternative patch, should not change performance.

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1919) Analysis back compat break

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756898#action_12756898 ] 

Yonik Seeley commented on LUCENE-1919:
--------------------------------------

Robert, you would need to handle the incrementToken() case too in next() - that's actually where the bug occured in the Solr test.

{code}
    if (supportedMethods.hasIncrementToken) {
      tokenWrapper.delegate = new Token();
      return incrementToken() ? ((Token) tokenWrapper.delegate.clone()) : null;
{code}

Could we remove the clone()?  not sure...

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1919) Analysis back compat break

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757149#action_12757149 ] 

Mark Miller commented on LUCENE-1919:
-------------------------------------

I tested the patch I did with the failing test, and it worked - so I'm sure yours does too - I'll do a test with the latest patch right now though.

The original report came from Gregg Donovanin Solr land.

He made a nice little unit test that fails in Solr showing the problem (see the links Yonik posted).

https://issues.apache.org/jira/browse/SOLR-1445

2 hours is fine with me - I just would like to get the RC out today if possible. 

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Assigned: (LUCENE-1919) Analysis back compat break

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler reassigned LUCENE-1919:
-------------------------------------

    Assignee: Uwe Schindler

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1919) Analysis back compat break

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757201#action_12757201 ] 

Uwe Schindler commented on LUCENE-1919:
---------------------------------------

Thanks Robert, I added your test and did some more tests with the round robin stream. We can now be sure, that it is fully private, not even the attributes are touched :-) -> but I see no real sense in this requirement :-) People should never use old and the brand new API in one consumer. Mixing old and very-old is ok and was obviously used.

I will commit shortly.

Just one question: Does anybody know, why there is this extra payload cloning?

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1919) Analysis back compat break

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757156#action_12757156 ] 

Mark Miller commented on LUCENE-1919:
-------------------------------------

I can confirm the latest patch fixes the Solr issue that prompted this.

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1919) Analysis back compat break

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757215#action_12757215 ] 

Yonik Seeley commented on LUCENE-1919:
--------------------------------------

bq. This was introduced at the end of LUCENE-1057 without patch in JIRA. So it must have something to do with this. Maybe Yonik can explain.

Urg... 2 years ago.
I think it was because Token.clone() didn't clone the payload, so next() would do it to create a fully private copy.
Is it stil applicable?  So many changes, I don't know.

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-1919) Analysis back compat break

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756883#action_12756883 ] 

Mark Miller edited comment on LUCENE-1919 at 9/17/09 5:41 PM:
--------------------------------------------------------------

At the worst, we can just clone the delegate and not be reusable (the javadoc says you don't have to be reusable)

Not ideal, but it will fix, and cease to be a (possible performance) problem in 3.0 :)

      was (Author: markrmiller@gmail.com):
    At the worst, we can just clone the delegate and not be reusable (the javadoc says you don't have to be reusable)

Not ideal, but it will fix, and cease to be a problem in 3.0 :)
  
> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1919) Analysis back compat break

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757178#action_12757178 ] 

Robert Muir commented on LUCENE-1919:
-------------------------------------

Uwe, i tested your patch, this is good :)

Just out of curiousity though, I am not sure i see the problem with next() then incrementToken() with my patch.
(Your patch is better imho, this is not the issue).

I tried your modified test and it passes with my patch also, is there something I am missing?

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1919) Analysis back compat break

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler resolved LUCENE-1919.
-----------------------------------

    Resolution: Fixed

Committed revision: 816673

I added no CHANGES entry as we are all committers and no external persons involved. This bug was not in any previous release.

Thanks Robert for the great tests and all others for help resolving this bug. Mark, go on with the RC5! ;-)

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1919) Analysis back compat break

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757140#action_12757140 ] 

Mark Miller commented on LUCENE-1919:
-------------------------------------

Commit away as soon as you can Uwe - I pump out RC5 as soon as I can right after.

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1919) Analysis back compat break

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757232#action_12757232 ] 

Mark Miller commented on LUCENE-1919:
-------------------------------------

bq. Thanks Robert for the great tests and all others for help resolving this bug. Mark, go on with the RC5! 

Thanks a lot Uwe! I really appreciate the speed with which you have been addressing these RC issues!

Its taken a while to get this release out, but we have barley wasted a moment in cranking along.

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1919) Analysis back compat break

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757224#action_12757224 ] 

Uwe Schindler commented on LUCENE-1919:
---------------------------------------

Token.clone() is not called by next() anymore (and even not in 2.4.1) - because of this we need the Payload cloning (Token.clone() would do it). The full private Token is done like this now (and this was the same in 1057): next(Token) is simply called with a new Token instance. The fix for this issue does it in the same way (like 2.4.1). The extension here is, that it now works exactly the same with incrementToken(). After calling next(Token) with the private token, the Payload is explicitely cloned.

As I told some comments above, I think more the problem is the following: If a next(Token) or incrementToken() method sets a Payload, the payload data is not copied, its only a reference to the byte array. E.g. the next(Token) methods uses a pre-allocated array and sets this always as data (which is perfectly legal in the reuse case), only modifying the data contents. If you call next(Token) with a new allocated Token, this Token is private, but if next(Token) sets again the preallocated byte array, it is not private anymore (you will se the modifications in the previous token, too). You would have the same bug like now (even in 2.4.1). I think because of this the payload is cloned separately to be sure that it is private like the newly allocated Token.

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1919) Analysis back compat break

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756901#action_12756901 ] 

Mark Miller commented on LUCENE-1919:
-------------------------------------

Nice - thanks Robert!

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1919) Analysis back compat break

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757225#action_12757225 ] 

Mark Miller commented on LUCENE-1919:
-------------------------------------

bq. I think because of this the payload is cloned separately to be sure that it is private like the newly allocated Token.

Thats what it appears to be if you look at the tests added with the change -

one of the tests is a next(Token) impl that just keeps setting the payload with the same payload instance that it has as a field.

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1919) Analysis back compat break

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-1919:
--------------------------------

    Attachment: LUCENE-1919.patch

nevermind: found it.

Uwe's patch plus a testcase that passes with his fix, but fails with the old patch i supplied.

what Uwe fixed was the case where TokenStream only supports next(Token), but you consume it foo = incrementToken(), followed by bar = next()
in this case bar should be fully private and not overwrite the contents of foo.

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1919) Analysis back compat break

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-1919:
----------------------------------

    Attachment: LUCENE-1919.patch

Here is my solution for the problem. Michael and me never thought about mixing the old and very old API, which seems to be used. The problem is now, that the behaviour changed for calling next(). The original 2.4.1 code looks the following:

{code}
public Token next() throws IOException {
  final Token reusableToken = new Token();
  Token nextToken = next(reusableToken);

  if (nextToken != null) {
    Payload p = nextToken.getPayload();
    if (p != null) {
      nextToken.setPayload((Payload) p.clone());
    }
  }

  return nextToken;
}
{code}

The difference is, that a new token is created *before* the call to the reusable next() methods (incrementToken() is somehow reuseable, too).

The attached patch, restores exactly this functionality, but also for incrementToken(). It also removes an unneeded assignment to the deleget in the case of next(Token) delegating to next() (which is also a bug, because it makes the token no longer private for the caller - it can be overridden by a later call to incrementToken()).

Tokens generated by next() must be always private and not shared or reused as reusableToken for later next/incrementToken calls by the API. This is the problem of Robert's patch. The full private token (which was cloned before) is then also available as delegate for later calls to incrementToken() (for next(Token) the delegate is replaced, as Robert noticed, so no problem here).

The wrapper around incrementToken() does the following:
- save the current delegate
- replace delegate by a new Token (just like the old next() in 2.4.1)
- call incrementToken and assign to nextToken variable
- restore the reusable delegate
- after that all goes like for next(Token)

Simply said: the next() wrapper is completely decoupled and always uses a completely private (new) Token instance.

I am not sure, why this payload cloning code is in 2.4.1, but I moved it here, too. I think it is because of some old bug, where a payload was assigned in next(Token), that was also shared by the TokenStream itsself between more than one tokens. Using this code, the Token is for sure full private (and even not reused later as before).

Using this patch, you should now even be able to mix all three APIs in one filter/consumer - but I still would'nt do this :-)

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1919) Analysis back compat break

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757444#action_12757444 ] 

Uwe Schindler commented on LUCENE-1919:
---------------------------------------

A really funny test fragment. Fascinating :-) Good to hear that the API even passes this test! Thanks for testing!

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, random_tests_fragment.txt
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1919) Analysis back compat break

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756924#action_12756924 ] 

Robert Muir commented on LUCENE-1919:
-------------------------------------

bq. Still wonder if it's safe to get rid of that second clone()... the combinations are mind-bending. 

its not safe to do this for the case of tokenstream that only supports next(reusableTS) but not incrementToken.
otherwise, next() does not return a full copy, but a reference to the delegate, which will be overwritten by future calls to next().
if you want to get rid of it, then you need to clone the delegate before deferring to next(Token).


> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1919) Analysis back compat break

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-1919:
--------------------------------

    Attachment: LUCENE-1919.patch

better patch with testcase for the issue.

really, its just that in next() tokenwrapper must be cloned before calling incrementToken, instead of after.

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1919) Analysis back compat break

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757205#action_12757205 ] 

Uwe Schindler commented on LUCENE-1919:
---------------------------------------

bq. Just one question: Does anybody know, why there is this extra payload cloning?

This was introduced at the end of LUCENE-1057 without patch in JIRA. So it must have something to do with this. Maybe Yonik can explain.

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1919) Analysis back compat break

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757263#action_12757263 ] 

Jason Rutherglen commented on LUCENE-1919:
------------------------------------------

For SOLR-908, I can try out the patch, though we've reverted
back to Solr trunk from 8/31, unfortunately, I can't reproduce
the bug using a multithreaded query parsing unit test.

I decided the best avenue, given it must be thread related to
behave so randomly, is to alter QueryParser.getFieldQuery to not
call analyzer.reusableTokenStream and only use
analyzer.tokenStream. 

This effectively avoids the use of threadlocal reusable
tokenstreams for query parsing. I am left wondering about the
state of our index using reusable tokenstreams but luckily we
have so many documents, that this is less of a concern. 

The queries being truncated is of course very serious as users
see irrelevant results and assume that Lucene/Solr 2.9/1.4 is no
good.

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1919) Analysis back compat break

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-1919:
----------------------------------

    Attachment: LUCENE-1919.patch

This is just a patch that enhances/rectifies the BW testcase. It corrects the messages in assertTrue, because they should refelct that something is going wrong (just cosmetics). But it adds a test for correctness of other token's contents to these POSToken tests, not only if the one is a proper noun.

It also mixes consuming the new API into Robert's test and the call to next(Token), to check if the full-private Token returned from next() is still valid.

Nothing special, no other changes in core code.

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1919) Analysis back compat break

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated LUCENE-1919:
---------------------------------

    Attachment: random_tests_fragment.txt

I've been doing some more testing... everything looks good.
I did some random testing too, just to see if there are any combination corner cases we forgot about... it tries random combination of filters that support and used the different old/new stile APIs.  It was inline with other Solr tests, so I just cut-n-pasted it to this file for posterity.

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, random_tests_fragment.txt
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1919) Analysis back compat break

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757207#action_12757207 ] 

Mark Miller commented on LUCENE-1919:
-------------------------------------

The copy of payload came in LUCENE-1057.

The clone came in LUCENE-1062.

> Analysis back compat break
> --------------------------
>
>                 Key: LUCENE-1919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1919
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>            Assignee: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch, LUCENE-1919.patch
>
>
> Old and new style token streams don't mix well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org