You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Mark Miller <ma...@gmail.com> on 2009/07/01 15:13:47 UTC

Re: [jira] Commented: (LUCENE-1693) AttributeSource/TokenStream API improvements

Hows the progress here guys? I have 2 or 3 issues that relate to this, and I
really don't want to commit/finish them until this is done ...

RE: [jira] Commented: (LUCENE-1693) AttributeSource/TokenStream API improvements

Posted by Uwe Schindler <uw...@thetaphi.de>.

After thinking one more time about it, there are still possibilities to
break BW in some corner cases with non-final classes. See my last comment in
the issue!

 

This new API is hard to integrate in a BW compatible way. But in all cases,
this case and the new TokenStream API is a show stopper for 2.9.

 

Uwe

 

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Uwe Schindler [mailto:uwe@thetaphi.de] 
Sent: Wednesday, July 01, 2009 3:34 PM
To: java-dev@lucene.apache.org
Subject: RE: [jira] Commented: (LUCENE-1693) AttributeSource/TokenStream API
improvements

 

I do not work on this at the moment. The current status is:

- All final token stream/filter classes only need to implement
incrementToken() as they have no backwards compatibility problem.

- If it is possible for somebody (or better likely the case) to override one
of both methods, where only one of them is implemented in lucene, there may
be the problem that overridden code is never called (because incrementToken
is always preferred).

 

So: For most analyzers from Lucene core this is no problem, as they are
final. Only classes like CharTokenizer, that are written explicitely for
subclassing must implement both. I will revert my changes (remove all
next(Token) from all streams) soon (post a new patch). But this problem is
the same we had before: If one overrides only next() in one of the non-final
streams, this method is also never called. So this was a break in backwards
compatibility in the past, too.

 

My recommendation: Even if classes are not final, we could convert
next(Token) to incrementToken() and remove the old ones, if it is not likely
the case, that somebody have tried to customize one of the deprecated
methods. And a clear not in the Changes.txt.

 

In my opinion, this small backwards break is better than creating a switch
like userNewAPI, that is never set to 1, so the code is never used in Lucene
2.9 and everybody breaks when 3.0 comes out.

 

In my opinion, this patch can be extensive tested, the questions about
backwards compatibility are solved. What is still broken is Sink and
TeeTokenizer, the old classes must be deprecated, because it is different if
you store attribute states in the map or Tokens and is not implementable in
one class (it is already broken in trunk, as the javadocs always talk about
Token instances in the Set, but when useneAPI is on they are suddenly
Attribute states). I hope Michael will have time to review this, too. It
seems that we are the only ones who can follow the thread :-) and all these
"hey here is a BW break." hick-hacks.

 

Uwe

 

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Mark Miller [mailto:markrmiller@gmail.com] 
Sent: Wednesday, July 01, 2009 3:14 PM
To: java-dev@lucene.apache.org
Subject: Re: [jira] Commented: (LUCENE-1693) AttributeSource/TokenStream API
improvements

 

Hows the progress here guys? I have 2 or 3 issues that relate to this, and I
really don't want to commit/finish them until this is done ...

Re: [jira] Commented: (LUCENE-1693) AttributeSource/TokenStream API improvements

Posted by Mark Miller <ma...@gmail.com>.

>> I hope Michael will have time to review this, too. It seems that we are
the only ones who can follow the thread J and all these “hey here is a BW
break…” hick-hacks.

Personally, I'm not a big fan of the new API. I think its going to be *much*
harder for new users to grok/manipulate. These back compat breaks are only
going to make it that much more difficult. I think that it could be a fine
trade off if Flexible Indexing ends up being able to move mountains for me,
but as I think that's going to end up being niche coolness, I'm just not
100% keen on the new API. So basically, I don't have enough interest to
catch up with these patches and read through this whole thread. I'm also
concerned with a point someone previously brought up - we are deprecating
the old API, telling users to move, but we are labeling the new API as
experimental and subject to change. Jump user, jump.

-- 
-- 
- Mark

http://www.lucidimagination.com

RE: [jira] Commented: (LUCENE-1693) AttributeSource/TokenStream API improvements

Posted by Uwe Schindler <uw...@thetaphi.de>.

I do not work on this at the moment. The current status is:

- All final token stream/filter classes only need to implement
incrementToken() as they have no backwards compatibility problem.

- If it is possible for somebody (or better likely the case) to override one
of both methods, where only one of them is implemented in lucene, there may
be the problem that overridden code is never called (because incrementToken
is always preferred).

 

So: For most analyzers from Lucene core this is no problem, as they are
final. Only classes like CharTokenizer, that are written explicitely for
subclassing must implement both. I will revert my changes (remove all
next(Token) from all streams) soon (post a new patch). But this problem is
the same we had before: If one overrides only next() in one of the non-final
streams, this method is also never called. So this was a break in backwards
compatibility in the past, too.

 

My recommendation: Even if classes are not final, we could convert
next(Token) to incrementToken() and remove the old ones, if it is not likely
the case, that somebody have tried to customize one of the deprecated
methods. And a clear not in the Changes.txt.

 

In my opinion, this small backwards break is better than creating a switch
like userNewAPI, that is never set to 1, so the code is never used in Lucene
2.9 and everybody breaks when 3.0 comes out.

 

In my opinion, this patch can be extensive tested, the questions about
backwards compatibility are solved. What is still broken is Sink and
TeeTokenizer, the old classes must be deprecated, because it is different if
you store attribute states in the map or Tokens and is not implementable in
one class (it is already broken in trunk, as the javadocs always talk about
Token instances in the Set, but when useneAPI is on they are suddenly
Attribute states). I hope Michael will have time to review this, too. It
seems that we are the only ones who can follow the thread :-) and all these
"hey here is a BW break." hick-hacks.

 

Uwe

 

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Mark Miller [mailto:markrmiller@gmail.com] 
Sent: Wednesday, July 01, 2009 3:14 PM
To: java-dev@lucene.apache.org
Subject: Re: [jira] Commented: (LUCENE-1693) AttributeSource/TokenStream API
improvements

 

Hows the progress here guys? I have 2 or 3 issues that relate to this, and I
really don't want to commit/finish them until this is done ...