You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2009/06/09 18:43:07 UTC

[jira] Created: (LUCENE-1678) Deprecate Analyzer.tokenStream

Deprecate Analyzer.tokenStream
------------------------------

                 Key: LUCENE-1678
                 URL: https://issues.apache.org/jira/browse/LUCENE-1678
             Project: Lucene - Java
          Issue Type: Bug
          Components: Analysis
            Reporter: Michael McCandless
            Assignee: Michael McCandless
            Priority: Minor
             Fix For: 2.9


The addition of reusableTokenStream to the core analyzers unfortunately broke back compat of external subclasses:

    http://www.nabble.com/Extending-StandardAnalyzer-considered-harmful-td23863822.html

On upgrading, such subclasses would silently not be used anymore, since Lucene's indexing invokes reusableTokenStream.

I think we should should at least deprecate Analyzer.tokenStream, today, so that users see deprecation warnings if their classes override this method.  But going forward when we want to change the API of core classes that are extended, I think we have to  introduce entirely new classes, to keep back compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Tue, Jun 9, 2009 at 7:19 PM, Earwin Burrfoot (JIRA) <ji...@apache.org> wrote:
> You go zealously for back-compat - you sacrifice readability/maintainability of your code but free users from any troubles when they want to 'simply upgrade'. You adopt more relaxed policy - you sacrifice users' time, but in return you gain cleaner codebase and new stuff can be written and used faster.

Not sure I agree with that - if changes become too easy you can get a
thrashing effect... change just because someone thought it was a
little better can lead to more chaos.  IMO, changes to interfaces
should be clearly better than what existed before.  Stable interfaces
brings benefit to Lucene contributors/developers as well (not just
users).

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1678) Deprecate Analyzer.tokenStream

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-1678.
----------------------------------------

    Resolution: Fixed

> Deprecate Analyzer.tokenStream
> ------------------------------
>
>                 Key: LUCENE-1678
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1678
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1678.patch
>
>
> The addition of reusableTokenStream to the core analyzers unfortunately broke back compat of external subclasses:
>     http://www.nabble.com/Extending-StandardAnalyzer-considered-harmful-td23863822.html
> On upgrading, such subclasses would silently not be used anymore, since Lucene's indexing invokes reusableTokenStream.
> I think we should should at least deprecate Analyzer.tokenStream, today, so that users see deprecation warnings if their classes override this method.  But going forward when we want to change the API of core classes that are extended, I think we have to  introduce entirely new classes, to keep back compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718079#action_12718079 ] 

Michael McCandless commented on LUCENE-1678:
--------------------------------------------


bq. Mike was gung ho for it for a while, and even he backed off. 

Well... my particular itch (most recently!) was an addition to Lucene
that'd let us conditionalize the default settings so that new users
get the latest & greatest, but back-compat users can easily preserve
old behavior.

Ie, it was a software change, not a policy change; I tried hard to
steer clear of any proposed changes to back-compat policy.

But, for better or worse, back-compat policy is one of those
"magnetic" topics: whenever you get too close to it, it suddenly
sticks to you and takes over your thread.

And in the end we arrived at a workable solution to my particular
itch, which is to make such settings explicit or switch to new APIs
that change the defaults (eg the new FSDir.open).

That said, improving our back compat policy *is* an important and
amazingly complex topic.


> Deprecate Analyzer.tokenStream
> ------------------------------
>
>                 Key: LUCENE-1678
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1678
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The addition of reusableTokenStream to the core analyzers unfortunately broke back compat of external subclasses:
>     http://www.nabble.com/Extending-StandardAnalyzer-considered-harmful-td23863822.html
> On upgrading, such subclasses would silently not be used anymore, since Lucene's indexing invokes reusableTokenStream.
> I think we should should at least deprecate Analyzer.tokenStream, today, so that users see deprecation warnings if their classes override this method.  But going forward when we want to change the API of core classes that are extended, I think we have to  introduce entirely new classes, to keep back compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Tue, Jun 9, 2009 at 8:23 PM, Earwin Burrfoot <ea...@gmail.com> wrote:
>> IMO, changes to interfaces should be clearly better than what existed before.
> Recent changes to DISI? Were they clearly for the better?

Recent *proposed* changes.... yes, for 3.0.
If you include the scorer changes, it's a bigger change than it
appears, and one I'm not sure I'd be comfortable with in a point
release.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

Posted by Earwin Burrfoot <ea...@gmail.com>.
@Mark:
>> Okay, there's an escape hatch I (and someone else) mentioned on the list
>> before. Adopting a fixed release cycle with small intervals between releases
>> (compared to what we have now). Fixed - as in, releases are made each N
>> months instead of when everyone feels they finished and polished up all
>> their pet projects and there's nothing else exciting to do. That way we can
>> keep the current policy, but deletion-through-deprecation approach will
>> work, at last!
> Thats a big change. I think its a nice idea, but I don't know how practical
> it is. Most of us are basically volunteering time for this type of thing.
> Even still, with the pace of development lately (and you can be sure that
> the current pace is a *new* thing, Lucene did not always have this amount of
> activity), it might make sense.
You're missing the most important point. Fixed schedule means that the
only reason not to do a release is the total abscence of changes.
No matter how much or how few changes are released each time, fixed
schedule gives you predictable lifecycle for all your
deprecation/back-compat needs.

> But that idea needs a champion, and frankly
> I don't have the time right now (it wouldn't likely be in my realm anyway).
> And thats probably the deal with most others. They have work and/or other
> itches that are higher priority than championing a big change.
And here we got at one of the roots of the problem. The root that is
going to stay.

>> bq. Giving up is really not the answer though
>> It is the answer. I have no moral right to hammer my ideals into heads
>> that did tremendously more for the project, than I did. And maintaining a
>> patch queue over Lucene trunk is not 'that' hard.
> Its not about hammering your ideals - that almost feels like what you are
> doing, but frankly, it doesn't help. If you even just keep prompting the
> issue as it dies away you will likely keep progress going. There is a
> solution that everyone will accept. I promise you that. Its more work than
> it looks to find that solution and guide it to fruition though. Its fully
> possible, and I'm sure it will happen eventually. Would have beat even money
> that Mike had it a few weeks ago. No dice it looks though ;)
I consciously took a bit of an extremist stance in hope to shift the
mean. Okay, will try ditching it in favour of gently bugging people
like Grant did in the comment that spawned this discussion. :)

@Yonik:
>> You go zealously for back-compat - you sacrifice readability/maintainability of your code but free users from any troubles when they want to 'simply upgrade'. You adopt more relaxed policy - you sacrifice users' time, but in return you gain cleaner codebase and new stuff can be written and used faster.
> Not sure I agree with that - if changes become too easy you can get a
> thrashing effect... change just because someone thought it was a
> little better can lead to more chaos.
You're right.
I'm not advocating anarchy. :) But currently we are afraid to break
anything at all, and that is as far away from juste milieu as the
chaos you speak of.

> IMO, changes to interfaces should be clearly better than what existed before.
Recent changes to DISI? Were they clearly for the better?

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

Posted by Mark Miller <ma...@gmail.com>.
Earwin Burrfoot (JIRA) wrote:
> -----------------------------------------
>
> bq. If there are sane/smart ways to change our back compat policy, I think you have seen that no one would object.
> It's not a matter of finding a smart way. It is a matter of sacrifice that has to be made and readiness to take the blame for decision that can be unpopular with someone.
> You go zealously for back-compat - you sacrifice readability/maintainability of your code but free users from any troubles when they want to 'simply upgrade'. You adopt more relaxed policy - you sacrifice users' time, but in return you gain cleaner codebase and new stuff can be written and used faster.
> There's no way to ride two horses at once.
>
> Some people are comfortable with current policies. Few cringe when they hear things like above. Most theoretically want to relax the rules. Nobody's ready to give up something for it.
>   
I don't agree. I think everyone would be willing to give something up. 
But some won't want to give up certain things.
> Okay, there's an escape hatch I (and someone else) mentioned on the list before. Adopting a fixed release cycle with small intervals between releases (compared to what we have now). Fixed - as in, releases are made each N months instead of when everyone feels they finished and polished up all their pet projects and there's nothing else exciting to do. That way we can keep the current policy, but deletion-through-deprecation approach will work, at last!
>   
Thats a big change. I think its a nice idea, but I don't know how 
practical it is. Most of us are basically volunteering time for this 
type of thing. Even still, with the pace of development lately (and you 
can be sure that the current pace is a *new* thing, Lucene did not 
always have this amount of activity), it might make sense. But that idea 
needs a champion, and frankly I don't have the time right now (it 
wouldn't likely be in my realm anyway). And thats probably the deal with 
most others. They have work and/or other itches that are higher priority 
than championing a big change.
> This solution is halfassed, I can already see discussions like "That was a big change, let's keep the deprecates around longer, say - for a couple of releases.", it doesn't solve good-name-thrashing problem, as you have to go through two rounds of deprecation to change semantics on something, but keep the name.
> But this is something better than what we have now, a-a-and this is something that needs commiter backing.
>
> bq. Thats a great indication to me that the issue is not simple.
> The issue is simple, the choice is not. And maintaining status quo is free.
>   
Right. Its not about anyone arguing against it. People made arguments 
and raised points from various angles - none of that biases the 
conclusion, it only strengthens it. I poke holes at things I fully 
support - it should survive the shot if it makes sense. It comes down to 
the effort involved in guiding this forward. I know the majority want to 
see something succeed. Probably the best argument is the one Mike first 
championed - we are hurting new users by saddling them with back compat. 
I think we all want a better compromise, leaning further towards out of 
the box experience than we do now.
> bq. Giving up is really not the answer though
> It is the answer. I have no moral right to hammer my ideals into heads that did tremendously more for the project, than I did. And maintaining a patch queue over Lucene trunk is not 'that' hard.
>   
Its not about hammering your ideals - that almost feels like what you 
are doing, but frankly, it doesn't help. If you even just keep prompting 
the issue as it dies away you will likely keep progress going. There is 
a solution that everyone will accept. I promise you that. Its more work 
than it looks to find that solution and guide it to fruition though. Its 
fully possible, and I'm sure it will happen eventually. Would have beat 
even money that Mike had it a few weeks ago. No dice it looks though ;)

- Mark
>
>   
>


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717862#action_12717862 ] 

Earwin Burrfoot commented on LUCENE-1678:
-----------------------------------------

bq. If there are sane/smart ways to change our back compat policy, I think you have seen that no one would object.
It's not a matter of finding a smart way. It is a matter of sacrifice that has to be made and readiness to take the blame for decision that can be unpopular with someone.
You go zealously for back-compat - you sacrifice readability/maintainability of your code but free users from any troubles when they want to 'simply upgrade'. You adopt more relaxed policy - you sacrifice users' time, but in return you gain cleaner codebase and new stuff can be written and used faster.
There's no way to ride two horses at once.

Some people are comfortable with current policies. Few cringe when they hear things like above. Most theoretically want to relax the rules. Nobody's ready to give up something for it.

Okay, there's an escape hatch I (and someone else) mentioned on the list before. Adopting a fixed release cycle with small intervals between releases (compared to what we have now). Fixed - as in, releases are made each N months instead of when everyone feels they finished and polished up all their pet projects and there's nothing else exciting to do. That way we can keep the current policy, but deletion-through-deprecation approach will work, at last!
This solution is halfassed, I can already see discussions like "That was a big change, let's keep the deprecates around longer, say - for a couple of releases.", it doesn't solve good-name-thrashing problem, as you have to go through two rounds of deprecation to change semantics on something, but keep the name.
But this is something better than what we have now, a-a-and this is something that needs commiter backing.

bq. Thats a great indication to me that the issue is not simple.
The issue is simple, the choice is not. And maintaining status quo is free.

bq. Giving up is really not the answer though
It is the answer. I have no moral right to hammer my ideals into heads that did tremendously more for the project, than I did. And maintaining a patch queue over Lucene trunk is not 'that' hard.


> Deprecate Analyzer.tokenStream
> ------------------------------
>
>                 Key: LUCENE-1678
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1678
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The addition of reusableTokenStream to the core analyzers unfortunately broke back compat of external subclasses:
>     http://www.nabble.com/Extending-StandardAnalyzer-considered-harmful-td23863822.html
> On upgrading, such subclasses would silently not be used anymore, since Lucene's indexing invokes reusableTokenStream.
> I think we should should at least deprecate Analyzer.tokenStream, today, so that users see deprecation warnings if their classes override this method.  But going forward when we want to change the API of core classes that are extended, I think we have to  introduce entirely new classes, to keep back compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718122#action_12718122 ] 

Shai Erera commented on LUCENE-1678:
------------------------------------

We've had this thread http://www.nabble.com/Lucene%27s-default-settings---back-compatibility-td23605466.html, and in the latest post (http://www.nabble.com/Re%3A-Lucene%27s-default-settings---back-compatibility-p23792927.html) I tried to put together some wording for a revised (and relaxed) back-compat policy. I believe it was Grant who asked for some writeup to get to the users' list, and I read also that we may want to discuss each item separately, to get to a consensus.

Perhaps we can continue the discussion on that thread, and try to get to a consensus on any of the items? We don't necessarily need to change all of it in one day, but getting some feedback from you on any of the items can help bring that discussion back to life, and hopefully reach a consensus.

As was said on this thread, persistence will eventually drive us to reach a consensus, so I'm being persistent :).

> Deprecate Analyzer.tokenStream
> ------------------------------
>
>                 Key: LUCENE-1678
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1678
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The addition of reusableTokenStream to the core analyzers unfortunately broke back compat of external subclasses:
>     http://www.nabble.com/Extending-StandardAnalyzer-considered-harmful-td23863822.html
> On upgrading, such subclasses would silently not be used anymore, since Lucene's indexing invokes reusableTokenStream.
> I think we should should at least deprecate Analyzer.tokenStream, today, so that users see deprecation warnings if their classes override this method.  But going forward when we want to change the API of core classes that are extended, I think we have to  introduce entirely new classes, to keep back compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717888#action_12717888 ] 

Grant Ingersoll commented on LUCENE-1678:
-----------------------------------------

bq. If there are sane/smart ways to change our back compat policy, I think you have seen that no one would object.

The sane/smart way is to do it on a case by case basis.  Here is a specific case.  Generalizing it a bit, the place where it should be more easily relaxable are the cases where we know very few people make customizations, as in implementing Fieldable or FieldCache.

As for this specific case, the original change was the thing that broke back compat.  So, given it is already broken, why not fix it the right way?

> Deprecate Analyzer.tokenStream
> ------------------------------
>
>                 Key: LUCENE-1678
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1678
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The addition of reusableTokenStream to the core analyzers unfortunately broke back compat of external subclasses:
>     http://www.nabble.com/Extending-StandardAnalyzer-considered-harmful-td23863822.html
> On upgrading, such subclasses would silently not be used anymore, since Lucene's indexing invokes reusableTokenStream.
> I think we should should at least deprecate Analyzer.tokenStream, today, so that users see deprecation warnings if their classes override this method.  But going forward when we want to change the API of core classes that are extended, I think we have to  introduce entirely new classes, to keep back compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1678) Deprecate Analyzer.tokenStream

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1678:
---------------------------------------

    Attachment: LUCENE-1678.patch

OK, inspired by Uwe's persistence on LUCENE-1693, I realized one clean
way to fix the back-compat break here is by using reflection when
creating the Analyzer as to whether the class overrides the
tokenStream method.  Then, in reusableTokenStream we forcefully
fallback to tokenStream, if it does.

Attached patch, with a test case showing the issue, implements this
approach, and it works well.  With this approach there's no reason to
deprecate tokenStream.


> Deprecate Analyzer.tokenStream
> ------------------------------
>
>                 Key: LUCENE-1678
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1678
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1678.patch
>
>
> The addition of reusableTokenStream to the core analyzers unfortunately broke back compat of external subclasses:
>     http://www.nabble.com/Extending-StandardAnalyzer-considered-harmful-td23863822.html
> On upgrading, such subclasses would silently not be used anymore, since Lucene's indexing invokes reusableTokenStream.
> I think we should should at least deprecate Analyzer.tokenStream, today, so that users see deprecation warnings if their classes override this method.  But going forward when we want to change the API of core classes that are extended, I think we have to  introduce entirely new classes, to keep back compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717831#action_12717831 ] 

Mark Miller commented on LUCENE-1678:
-------------------------------------

>>Second this. Though I lost any hope for sane Lucene release/compat rules. 

Why? Have you seen anyone arguing for anything else?

If there are sane/smart ways to change our back compat policy, I think you have seen that no one would object.

Its a complicated topic that has come up for discussion many times, but I don't think the current policy is insane. And I have seen most people supporting whatever is best for Lucene. But - see all of the posts on the topic. Its complicated. Nobody even really torpedoed anything, its more that enough issues were raised and no one with a proper amount of authority felt comfortable stepping up to the plate. Mike was gung ho for it for a while, and even he backed off. Thats a great indication to me that the issue is not simple. Back compat currently is not insane, but I think we all agree it should be loosened somehow in the future.

The way Lucene stuff generally goes, if someone like Grant or Mike really wanted to push changes, the changes would happen. I think they both see that the effort involved in such a change is not small though. Back compat is like our constitution. Its  a pain in the butt to change in a way that everyone could get on board with. Even still, if someone really wanted to, they could probably push through that. It seems we havn't gotten to such a point with anyone yet though.

Giving up is really not the answer though - thats why the discussion has come and gone in the past. The effort to get anything done grew (in terms of ideas as much as any implementation), and one by one, the participants dropped out.

> Deprecate Analyzer.tokenStream
> ------------------------------
>
>                 Key: LUCENE-1678
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1678
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The addition of reusableTokenStream to the core analyzers unfortunately broke back compat of external subclasses:
>     http://www.nabble.com/Extending-StandardAnalyzer-considered-harmful-td23863822.html
> On upgrading, such subclasses would silently not be used anymore, since Lucene's indexing invokes reusableTokenStream.
> I think we should should at least deprecate Analyzer.tokenStream, today, so that users see deprecation warnings if their classes override this method.  But going forward when we want to change the API of core classes that are extended, I think we have to  introduce entirely new classes, to keep back compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

Posted by Mark Miller <ma...@gmail.com>.
Michael McCandless (JIRA) wrote:
>     [ https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718080#action_12718080 ] 
>
> Michael McCandless commented on LUCENE-1678:
> --------------------------------------------
>
>
> bq. The way Lucene stuff generally goes, if someone like Grant or Mike really wanted to push changes, the changes would happen. 
>
> Well, it's consensus that we all need to reach (at least enough
> consensus to vote on it), and on complex topics it's not easy to get
> to consensus.
>   
Right. I didn't mean you guys could ram it down everyones throats - I 
basically meant, if you wanted to build the censuses, and you thought 
the idea was good - you could do it easier than many of the new guys 
might think. I've seen it happen before.
> bq. Giving up is really not the answer though - thats why the discussion has come and gone in the past.
>
> I don't think anyone has given up.  The issue still smoulders and
> flares up here and there (like, this issue).  Eventually we'll get
> enough consensus for something concrete to change.
>   
I think some of the newer people in the community do sink into a give up 
mentality (based on the comments I've seen). I think the issue is, and 
the reason I even responded to this to begin with, people jump to 
conclusions about whats going on here. They think the committers are 
stubborn and/or stuck in our old ways. That we are too in love with our 
back compat policy ;) Its common for some of us to point out things that 
slow issues down, and we don't always contribute much towards pushing 
some issues forward. Some of the newer guys in the community have gotten 
the wrong idea about that. Things tend to happen slowly, but with 
persistence they do happen. Lucene is kind of a conservative project, 
but I don't like the idea that some of the newer guys see things as 
locked up. I've been around long enough to know they are not. Everything 
is up for debate, and things have been moving steadily towards progress 
in Lucene land. Again, its like a constitution though - if it was easy 
to whip around the rules, we would have a lot of problems. When I make 
comments for or against something, I try and think about whats best for 
the community. I think others likely do the same thing.

Anyway, when I see those comments, I think - there is no need to lash 
out with little jives. Persistence will move things forward. It is about 
censuses building, and that takes time and effort. More for some than 
others. The funny thing is, from what I've seen, when push comes to 
shove, its easier to get consensus around here than some of the email 
discussions might suggest. It just takes effort and persistence.
>
> bq. I have no moral right to hammer my ideals into heads that did tremendously more for the project, than I did.
>
> In fact you do & should.  This is exactly how change happens.  Here's
> a great (though sexist) quote:
>
> "The reasonable man adapts himself to the world; the unreasonable one persists to adapt the world to himself. Therefore all progress depends on the unreasonable man." - George Bernard Sha
Right. Even though I think some of the newer guys have an odd (minor) 
disrespect for what came before them (when Lucene was younger, most of 
these issues didn't yet exist! And the project has been very 
successful/stable thus far), I am extremely happy that there are a bunch 
of new people shaking things up. I'd rather they didnt go away (thinking 
we Lucene is locked up in insanity) or stop talking about improving back 
compat :)

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718080#action_12718080 ] 

Michael McCandless commented on LUCENE-1678:
--------------------------------------------


bq. The way Lucene stuff generally goes, if someone like Grant or Mike really wanted to push changes, the changes would happen. 

Well, it's consensus that we all need to reach (at least enough
consensus to vote on it), and on complex topics it's not easy to get
to consensus.

bq. Giving up is really not the answer though - thats why the discussion has come and gone in the past.

I don't think anyone has given up.  The issue still smoulders and
flares up here and there (like, this issue).  Eventually we'll get
enough consensus for something concrete to change.


bq. I have no moral right to hammer my ideals into heads that did tremendously more for the project, than I did.

In fact you do & should.  This is exactly how change happens.  Here's
a great (though sexist) quote:

"The reasonable man adapts himself to the world; the unreasonable one persists to adapt the world to himself. Therefore all progress depends on the unreasonable man." - George Bernard Shaw




> Deprecate Analyzer.tokenStream
> ------------------------------
>
>                 Key: LUCENE-1678
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1678
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The addition of reusableTokenStream to the core analyzers unfortunately broke back compat of external subclasses:
>     http://www.nabble.com/Extending-StandardAnalyzer-considered-harmful-td23863822.html
> On upgrading, such subclasses would silently not be used anymore, since Lucene's indexing invokes reusableTokenStream.
> I think we should should at least deprecate Analyzer.tokenStream, today, so that users see deprecation warnings if their classes override this method.  But going forward when we want to change the API of core classes that are extended, I think we have to  introduce entirely new classes, to keep back compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717819#action_12717819 ] 

Grant Ingersoll commented on LUCENE-1678:
-----------------------------------------

I frankly don't like renaming something like this.  This is, once again, a case of back compatibility biting us.  If instead of working around back compat. we had just made Analyzer.tokenStream be reusable, we wouldn't have to do this.  Now, instead, we are going to have a convoluted name for something (reusableTS).

In my mind, better to just make .tokenStream do the right thing and get rid of reusableTokenStream.

> Deprecate Analyzer.tokenStream
> ------------------------------
>
>                 Key: LUCENE-1678
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1678
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The addition of reusableTokenStream to the core analyzers unfortunately broke back compat of external subclasses:
>     http://www.nabble.com/Extending-StandardAnalyzer-considered-harmful-td23863822.html
> On upgrading, such subclasses would silently not be used anymore, since Lucene's indexing invokes reusableTokenStream.
> I think we should should at least deprecate Analyzer.tokenStream, today, so that users see deprecation warnings if their classes override this method.  But going forward when we want to change the API of core classes that are extended, I think we have to  introduce entirely new classes, to keep back compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

Posted by "Earwin Burrfoot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717823#action_12717823 ] 

Earwin Burrfoot commented on LUCENE-1678:
-----------------------------------------

Second this. Though I lost any hope for sane Lucene release/compat rules.

> Deprecate Analyzer.tokenStream
> ------------------------------
>
>                 Key: LUCENE-1678
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1678
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The addition of reusableTokenStream to the core analyzers unfortunately broke back compat of external subclasses:
>     http://www.nabble.com/Extending-StandardAnalyzer-considered-harmful-td23863822.html
> On upgrading, such subclasses would silently not be used anymore, since Lucene's indexing invokes reusableTokenStream.
> I think we should should at least deprecate Analyzer.tokenStream, today, so that users see deprecation warnings if their classes override this method.  But going forward when we want to change the API of core classes that are extended, I think we have to  introduce entirely new classes, to keep back compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718032#action_12718032 ] 

Michael McCandless commented on LUCENE-1678:
--------------------------------------------

bq. The sane/smart way is to do it on a case by case basis.

Right, and the huge periodic discussions on back-compat do soften
"our" stance on these.  For example LUCENE-1542 was just such a case,
where we chose to simply fix the [rather nasty] bug at the expense of
possible apps relying on the broken behavior.

LUCENE-1679 is another (rather trivial) example, where we plan to
change certain fields in WildcardTermEnum to be final.


> Deprecate Analyzer.tokenStream
> ------------------------------
>
>                 Key: LUCENE-1678
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1678
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The addition of reusableTokenStream to the core analyzers unfortunately broke back compat of external subclasses:
>     http://www.nabble.com/Extending-StandardAnalyzer-considered-harmful-td23863822.html
> On upgrading, such subclasses would silently not be used anymore, since Lucene's indexing invokes reusableTokenStream.
> I think we should should at least deprecate Analyzer.tokenStream, today, so that users see deprecation warnings if their classes override this method.  But going forward when we want to change the API of core classes that are extended, I think we have to  introduce entirely new classes, to keep back compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730303#action_12730303 ] 

Uwe Schindler commented on LUCENE-1678:
---------------------------------------

Your solution is also cool, to fix the last problems with the core token streams in LUCENE-1693: If somebody overrides a deprecated method in one of the core tokenstreams (that are not final), the method is never called, because the indexer uses incrementToken per default. The same can be used to fix this problem in TokenStream, too.

I will prepare a patch for this (I am currently preparing a new patch with some tests and the solution for the problems with number of attribute instances may not be equals number of attributes).

> Deprecate Analyzer.tokenStream
> ------------------------------
>
>                 Key: LUCENE-1678
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1678
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1678.patch
>
>
> The addition of reusableTokenStream to the core analyzers unfortunately broke back compat of external subclasses:
>     http://www.nabble.com/Extending-StandardAnalyzer-considered-harmful-td23863822.html
> On upgrading, such subclasses would silently not be used anymore, since Lucene's indexing invokes reusableTokenStream.
> I think we should should at least deprecate Analyzer.tokenStream, today, so that users see deprecation warnings if their classes override this method.  But going forward when we want to change the API of core classes that are extended, I think we have to  introduce entirely new classes, to keep back compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718031#action_12718031 ] 

Michael McCandless commented on LUCENE-1678:
--------------------------------------------

bq. So, given it is already broken, why not fix it the right way?

Because two wrongs don't make a right?

(I assume you're suggesting changing tokenStream to match reusableTokenStream, ie allowing it to return a reused TokenStream between calls, and then deprecating reusableTokenStream).

Apps that get multiple TokenStreams from a single Analyzer and then iterate through them, would silently break, if we up and made this 2nd non-back-compatible change.

> Deprecate Analyzer.tokenStream
> ------------------------------
>
>                 Key: LUCENE-1678
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1678
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The addition of reusableTokenStream to the core analyzers unfortunately broke back compat of external subclasses:
>     http://www.nabble.com/Extending-StandardAnalyzer-considered-harmful-td23863822.html
> On upgrading, such subclasses would silently not be used anymore, since Lucene's indexing invokes reusableTokenStream.
> I think we should should at least deprecate Analyzer.tokenStream, today, so that users see deprecation warnings if their classes override this method.  But going forward when we want to change the API of core classes that are extended, I think we have to  introduce entirely new classes, to keep back compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Wed, Jun 10, 2009 at 12:45 PM, Mark Miller<ma...@gmail.com> wrote:

> I've heard that one before ;) In fact, we pretty much committed to releasing
> more often. Now if 2.9 would just fall into line with our darn commitments
> :)

I hear you!

So... how about we try to wrap up 2.9/3.0 and ship with what we have,
now? It's been 8 months since 2.4.0 was released, and 2.9's got plenty
of new stuff, and we are all itching to remove these deprecated APIs,
switch to Java 1.5, etc.

We should try to finish the issues that are open and underway... but I
think many of the issues marked 2.9 now, especially those not even
started, should not in fact block 2.9.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

Posted by Mark Miller <ma...@gmail.com>.
Michael McCandless (JIRA) wrote:
> --------------------------------------------
>
> bq. Adopting a fixed release cycle with small intervals between releases (compared to what we have now). 
>
> I think this is almost a good solution, though instead of "fixed" it
> could be that we try [harder] to do major releases more frequently.
> Let's face it: Lucene is changing quite quickly now, so it seems
> reasonable that the major releases also come quickly.
>
>   

>>though instead of "fixed" it
>>could be that we try [harder] to do major releases more frequently.


I've heard that one before ;) In fact, we pretty much committed to 
releasing more often. Now if 2.9 would just fall into line with our darn 
commitments :)

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718081#action_12718081 ] 

Michael McCandless commented on LUCENE-1678:
--------------------------------------------

bq. Adopting a fixed release cycle with small intervals between releases (compared to what we have now). 

I think this is almost a good solution, though instead of "fixed" it
could be that we try [harder] to do major releases more frequently.
Let's face it: Lucene is changing quite quickly now, so it seems
reasonable that the major releases also come quickly.

I say "almost" because.... alot of the pain in implementing our
current policy is the need to have a "stepping stone" between old and
new.  Ie, we now must always do a release that deprecates old APIs and
introduces new ones so that you can upgrade to that, fix deprecations,
and you know you're set for the next major release.  So eg changes to
interfaces is a big problem.  If we were free to suddenly make a new
major releases, with instructions on how to migrate old -> new, that'd
be very liberating.

I think nearly everyone agrees our back-compat policy is exceptionally
costly.  On a given interesting change to Lucene, a very large part of
the effort is spent on preserving back-compat. It causes all kinds of
spooky code, pollutes the APIs, causes us to go forward with sub-par
names, etc.  The freedom Marvin has to make changes to Lucy is
fabulous, though in exchange, it's not yet released...

I think most would also agree that it's far from easy even carrying
out the policy we have without making mistakes: this change (addition
of reusableTokenStream) violated our policy (I did it by accident and
nobody noticed until now).  I actually believe programming languages /
runtime envs need to provide more support for developers; we have
inadequate tools now.  But we can't wait for that...


> Deprecate Analyzer.tokenStream
> ------------------------------
>
>                 Key: LUCENE-1678
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1678
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>
> The addition of reusableTokenStream to the core analyzers unfortunately broke back compat of external subclasses:
>     http://www.nabble.com/Extending-StandardAnalyzer-considered-harmful-td23863822.html
> On upgrading, such subclasses would silently not be used anymore, since Lucene's indexing invokes reusableTokenStream.
> I think we should should at least deprecate Analyzer.tokenStream, today, so that users see deprecation warnings if their classes override this method.  But going forward when we want to change the API of core classes that are extended, I think we have to  introduce entirely new classes, to keep back compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org