You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Michael McCandless <lu...@mikemccandless.com> on 2009/05/18 23:06:39 UTC

Lucene's default settings & back compatibility

As we all know, Lucene's back-compat policy necessarily hurts the
out-of-the-box experience for new users: because we are only allowed
make substantial improvements to Lucene's default settings at a major
release, new users won't see the improvements to our settings until a
major release (typically years apart).

Lucene has a number of default settings, eg some recent examples:

  * Read-only IndexReader gives better much performance with threads,
    yet we must now default IndexReader.open to return a non-readOnly
    reader

  * We can now optionally turn off scoring when sorting by field
    (sizable speed gain), but we had to leave it on by default until
    3.0

  * Letting IndexReader.norms return null

  * LogMergePolicy now takes deletions into account, but we had to
    disable it by default, since it could conceivably break back
    compat.

  * Bug fixes in StandardAnalyzer must be delayed until 3.0 since
    there's a remote chance they'd break back compat in an app, or we
    end up adding confusing methods like "public static void
    setDefaultReplaceInvalidAcronym".

  * NIOFSDirectory ought to be "the default" on UNIX, but it's not

  * Constant score rewrite ought to be the default for most multi-term
    queries

  * StopFilter should enable position increments by default

The fact that we are "forced" delay such "out of the box" improvements
to Lucene for so long is a frustrating cost, since it can only stunt
Lucene's adoption and growth and my sense is that it's a minority of
Lucene's users that need such strict back-compat (this has been
discussed before).  It also clutters our APIs because we end up
creating setter/getters that often only exist for the sake of a back
compat preservation of a bug.

I think we can fix this.  Ie, maintain our strong back-compat policy,
yet still allow new users to experience the best of Lucene on every
release (not just on major releases), by creating an explicit class
that holds settings/defaults used by Lucene.

For example, say we create a base class named Settings.  It holds the
defaults for settings across all of Lucene's classes. When you create
IndexReader, IndexWriter and others, you must pass in a Settings
instance.

A subclass, SettingsMatching24, binds all settings to "match" 2.4's
behavior.  When we make improvements in 2.9, we'd add the back-compat
settings to SettingsMatching24.  So if your app wants to keep exactly
2.4's behavior, you'd pass in SettingsMatching24().  On upgrading to
2.9 you'd still see 2.4's behavior.

Users who'd like to see Lucene's improvements on each minor release
would instead instantiate LatestAndGreatestSettings() (or
CurrentVersionSettings(), or something), understanding that when they
upgrade there might be biggish changes to Lucene's defaults.  My guess
is most users would use this settings class.

Doug actually suggested this exact idea a while back:

  http://www.gossamer-threads.com/lists/lucene/java-dev/54421#54421.

Now that I realize we could use this to strongly decouple "users
wanting precise back-compat" from "users wanting the latest &
greatest", I think it's a very compelling solution.

If we do this I'd like to do it in 2.9, so that starting with 3.x we
are free to change default settings w/o breaking back compat.

Thoughts?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Mark Miller <ma...@gmail.com>.

Michael McCandless wrote:
>> Or, the removal of StopFilter as "Standard" all together.  This coupled with
>> a QP that created phrases around stop words is a better solution.
>>     
>
> Interesting... that'd be a pretty big change to StandardAnalyzer,
> though.
>
> I can see we are spinning off lots of neat ideas, decoupled from the
> "Settings" proposal, here :)
>
>   
>> For instance, if we removed the StopFilter from the StandardAnalyzer, then
>> what?  A Settings object would not be able to account for it.
>>     
>
> Why not?  The settings object could have say a property
> "analysis.standard.enableStopFilter"?
>
>   
I think this is a great idea down the road. We shouldn't be removing 
stopwords by default, and we should have better query time stopword 
removal. It won't help out of the box performance, but it will help 
first time users get stopwords right, rather than pointing them down the 
wrong path to start as we do now.

- Mark

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Grant Ingersoll <gs...@apache.org>.

On May 19, 2009, at 1:51 PM, Michael McCandless wrote:
>
> I think you've moved onto discussing something different: should we
> relax our back compat policy.  I'm all for that discussion, but it's
> different from "given our back compat policy, how can we implement it
> w/o harming new users of Lucene".

I don't agree.  Your proposing to go off and do a bunch of work to  
"fix the back compat" problem that has to do with our policies, not  
with our code.  In reality if were more pragmatic about back compat.  
there would be less of a need for it.

Sure, maybe it would still make sense to be able to emulate a certain  
setting from a version, but with a more relaxed back compat it might  
not even be possible to do that b/c the old code doesn't even exist  
and the user has no choice (well, they can not upgrade) but to use the  
better way b/c, as you pointed out, we want people to have the best  
possible experience with Lucene.  For instance, deprecated code could  
easily be removed sooner by saying:  @deprecated Will be removed in  
Version X.Y.  Use Z instead.   Seriously, it's May of '09 and we have  
deprecated constructors on IndexWriter that have been that way since  
January of 2008.  And, at the rate we're getting to 2.9 and 3.0, it  
will be 2010 before they are even removed.

-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Tue, May 19, 2009 at 8:56 AM, Grant Ingersoll <gs...@apache.org> wrote:

>> Why not?  The settings object could have say a property
>> "analysis.standard.enableStopFilter"?
>
> And what if it is something that has to be called in the next() chain and
> not during construction?  Are you going to want to call that every single
> time over millions upon millions of tokens in a large collection?   Even if
> it is during construction, you still might end up calling it a lot of times.

In fact, we already do that today (look at StandardTokenizer.java).

This doesn't differentiate in the current discussion ("using Settings
class to hold defaults").  Ie, regardless of whether we use Settings
(what's being proposed), or we make awkward set/getters all over our
classes (what's done today), doing so inside inner loops is still no
good.

I think you've moved onto discussing something different: should we
relax our back compat policy.  I'm all for that discussion, but it's
different from "given our back compat policy, how can we implement it
w/o harming new users of Lucene".

> There's a difference between std. coding practices and purposefully putting
> in lots of if checks to solve back compatibility issues that are created in
> order to satisfy some naming convention.  Given the length of time between
> releases, we could easily call every new release a major version and we
> wouldn't be all that different from most commercial projects.  I'd bet if we
> switched from calling things major.minor and just called them Lucene '09 and
> Lucene '10 people would be just fine with the changes.
>
> I've said it before and I'll say it again.  Given the time between Lucene
> releases (at least 6 mos. for minor releases and 1+ year for majors) we have
> _PLENTY_ of time to let users know what is coming and plan accordingly.   By
> being so dogmatic about back compatibility, I believe we are making it
> harder to innovate and harder for new people to contribute and we keep cruft
> around for way too long.  (How the heck is a new contributor supposed to
> keep track of all the things that went into Lucene for the past 1.5 years?)
>  I'm not saying we should throw back compat. out the window, I'm just saying
> we should take it more on a case by case basis, with the default, obviously,
> being to favor back compatibility.  The large majority of users  (I'd
> venture to say well north of 95% of them) will be able to deal with minor
> API changes every 6 to 8 months, especially if we are more proactive about
> communicating them to java-user@ and in CHANGES.  In fact, if we announced
> changes that are going to break for not the next version, but the one after,
> it would give people lots of time to adapt.

You've moved onto "should we relax our back-compat policy".  Yes, we
can consider doing so... but I'd like to stay focused here on "should
we switch to the Settings* approach to implement our back compat
policy".

By using Settings that explicitly capture the defaults for each
version, we can have our cake and eat it too: we are no longer forced
to stunt Lucene's growth for the minority that need strong
back-compat.  It also makes us freer to select our back-compat policy
since it's no longer a tradeoff of hurting new users.

> I think you missed the point.  The problem lies in releasing 2.4's settings
> and those settings are wrong.  Using your example, say Settings24 was messed
> up and set trackMaxScore to true when it should have been false (mistakes
> happen).  It gets released in 2.9 as the settings for 2.4 back
> compatibility.  We then realize our mistake.  How do you fix it?  You can't
> just set it to false, b/c now you have users who are depending, potentially,
> on the _wrong_ version.  So, now you have to deprecate it and come out with
> a "new" Settings2.4 called something else.

Well... that's a rather major mistake: if you add new feature X
("scoring is optional when sorting by field") and then in the
back-compat settings you get it backwards ("turn off scoring by
default"), that's quite an error.

I would hope/expect it's quite rare.  If such a bigtime mistake
happens I think that warrents a fast point-release turnaround fixing
it.

Also, this isn't differentiating, ie we could make such a mistake
today by incorectly defaulting one of our back-compat setters (and I
think in that case we also would turnaround a fast point release to
fix .

>>> I still think we would benefit from just communicating upcoming changes
>>> better even in minor releases, thereby allowing for a bit more variance
>>> in
>>> back compat.  It should be the exception, not the rule.
>>
>> I like DM's point, that this Settings class would be a great vehicle
>> for exactly that communication.  Rather than pouring over a
>> CHANGES.txt, you can see setting-by-setting what changed, and why.
>
> Sorry, I'd rather read CHANGES.  It is the one place we all make sure to
> enter our changes.  People aren't as good about javadocs, especially
> accessors where the name is "self explanatory".  Plus it has a link to a
> JIRA issue.

Let me restate: I think we'd do both -- CHANGES is still the
definitive place to see the exhaustive list of all changes, but
Settings* is the place to see changes where maintaining strict
back-compat costs you an important new feature.  EG because you are
using Settings24 you'd see that you're not taking advantage of the
performance gain of not computing scores when sorting by field.

> Also, how useful is it going to be to have 30 or 40 (hundreds?) accessors on
> a single Settings object?

I think the Settings24 would have far fewer?  Ie it'd have only the
settings forced to deviate from the preferred default.

> So, then, the logical thing to do is to split it up and have some
> nested way of doing things.  And then people will be tired of having
> to programmatically set all the values, so they will create a
> config/properties file that does it.  But, because we don't like
> dependencies, we will re-invent how that works.  After it's all said
> and done, you end up having re-invented IOC.

I agree there is a real risk of over-designing this.

Maybe... we only migrate things into the Settings* when they diverge
across versions?  That should keep the settings quite minimal.  Such
settings are typically deprecated anyway.  And rename it
"BackCompatSettings", or something, to make it clear.

> Another interesting thing to think about is how do we sunset old settings
> objects.  When we are on 4.X, should we still keep around 2.4 settings?  Not
> really something we necessarily need to solve right now.

That's also a policy (not implementation) question; our current policy
is to remove 2.* on releasing 3.0.  I think we'd want to stick with
that policy, ie many of these "back compat only settings" are
deprecated (eg autoCommit) so come 3.0 we can remove them.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Earwin Burrfoot <ea...@gmail.com>.

On Tue, May 19, 2009 at 16:56, Grant Ingersoll <gs...@apache.org> wrote:
> There's a difference between std. coding practices and purposefully putting
> in lots of if checks to solve back compatibility issues that are created in
> order to satisfy some naming convention.  Given the length of time between
> releases, we could easily call every new release a major version and we
> wouldn't be all that different from most commercial projects.
> I've said it before and I'll say it again.  Given the time between Lucene
> releases (at least 6 mos. for minor releases and 1+ year for majors) we have
> _PLENTY_ of time to let users know what is coming and plan accordingly.   By
> being so dogmatic about back compatibility, I believe we are making it
> harder to innovate and harder for new people to contribute and we keep cruft
> around for way too long.  (How the heck is a new contributor supposed to
> keep track of all the things that went into Lucene for the past 1.5 years?)
>  I'm not saying we should throw back compat. out the window, I'm just saying
> we should take it more on a case by case basis, with the default, obviously,
> being to favor back compatibility.  The large majority of users  (I'd
> venture to say well north of 95% of them) will be able to deal with minor
> API changes every 6 to 8 months, especially if we are more proactive about
> communicating them to java-user@ and in CHANGES.

> I think you missed the point.  The problem lies in releasing 2.4's settings
> and those settings are wrong.  Using your example, say Settings24 was messed
> up and set trackMaxScore to true when it should have been false (mistakes
> happen).  It gets released in 2.9 as the settings for 2.4 back
> compatibility.  We then realize our mistake.  How do you fix it?  You can't
> just set it to false, b/c now you have users who are depending, potentially,
> on the _wrong_ version.  So, now you have to deprecate it and come out with
> a "new" Settings2.4 called something else.

> Sorry, I'd rather read CHANGES.  It is the one place we all make sure to
> enter our changes.  People aren't as good about javadocs, especially
> accessors where the name is "self explanatory".  Plus it has a link to a
> JIRA issue.

> Also, how useful is it going to be to have 30 or 40 (hundreds?) accessors on
> a single Settings object?  So, then, the logical thing to do is to split it
> up and have some nested way of doing things.  And then people will be tired
> of having to programmatically set all the values, so they will create a
> config/properties file that does it.  But, because we don't like
> dependencies, we will re-invent how that works.  After it's all said and
> done, you end up having re-invented IOC.

God, let this man be heard. Please.
I mean, I agree with all said above, maybe in a bit less tactful way.

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Grant Ingersoll <gs...@apache.org>.

On May 19, 2009, at 8:19 AM, Michael McCandless wrote:

> On Tue, May 19, 2009 at 7:26 AM, Grant Ingersoll  
> <gs...@apache.org> wrote:
>
>> I don't think we have said that bug fixes are required to be back
>> compatible, even if it is in analysis.  I think it is a really bad  
>> idea for
>> TokenStreams to have if clauses in them checking boolean values for  
>> old
>> versus new behaviors.
>>
>> When they can be back compat, we do, but there is not a  
>> requirement.  For
>> instance, we upgraded Snowball.
>
> True (Snowball), but then we have discussions like this:
>
>  https://issues.apache.org/jira/browse/LUCENE-1068?focusedCommentId=12550948&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel 
> #action_12550948
>
> which added a confusing deprecated "boolean replaceDepAcronym =
> false;" to StandardAnalyzer.  Something similar led to
> StandardAnalyzer.replaceInvalidAcronym.
>
> I think there have been other cases (in particular StandardAnalyzer,
> QueryParser) over time, but I haven't tracked them down.  Analyzer
> back compat after fixing issues is especially tricky since the bugs
> get "cached" into the index and queries against that index using the
> fixed analyzer may not longer match the docs.  (So I think back-compat
> is important in Analyzers).
>
>> Or, the removal of StopFilter as "Standard" all together.  This  
>> coupled with
>> a QP that created phrases around stop words is a better solution.
>
> Interesting... that'd be a pretty big change to StandardAnalyzer,
> though.
>
> I can see we are spinning off lots of neat ideas, decoupled from the
> "Settings" proposal, here :)
>
>> For instance, if we removed the StopFilter from the  
>> StandardAnalyzer, then
>> what?  A Settings object would not be able to account for it.
>
> Why not?  The settings object could have say a property
> "analysis.standard.enableStopFilter"?

And what if it is something that has to be called in the next() chain  
and not during construction?  Are you going to want to call that every  
single time over millions upon millions of tokens in a large  
collection?   Even if it is during construction, you still might end  
up calling it a lot of times.

>
>
>> Likewise, the subtler issue of "fixing" a TokenStream such that it
>> might produce different tokens.
>
> Settings should cover this in general, I think.
>
>> I really worry about Settings objects having to be repeatedly  
>> checked inside
>> of tight inner loops.  Even looking at the new TokenStream stuff,  
>> there are
>> now checks for the "new API" in an area that is called _a lot_ of  
>> times.
>
> Agreed, but I'd say this is orthogonal.  We should never do slow
> things inside inner loops -- checking settings, calling logging
> frameworks, calling List.size(), opening files, etc.  This is the
> stuff of standard coding practices...

There's a difference between std. coding practices and purposefully  
putting in lots of if checks to solve back compatibility issues that  
are created in order to satisfy some naming convention.  Given the  
length of time between releases, we could easily call every new  
release a major version and we wouldn't be all that different from  
most commercial projects.  I'd bet if we switched from calling things  
major.minor and just called them Lucene '09 and Lucene '10 people  
would be just fine with the changes.

I've said it before and I'll say it again.  Given the time between  
Lucene releases (at least 6 mos. for minor releases and 1+ year for  
majors) we have _PLENTY_ of time to let users know what is coming and  
plan accordingly.   By being so dogmatic about back compatibility, I  
believe we are making it harder to innovate and harder for new people  
to contribute and we keep cruft around for way too long.  (How the  
heck is a new contributor supposed to keep track of all the things  
that went into Lucene for the past 1.5 years?)  I'm not saying we  
should throw back compat. out the window, I'm just saying we should  
take it more on a case by case basis, with the default, obviously,  
being to favor back compatibility.  The large majority of users  (I'd  
venture to say well north of 95% of them) will be able to deal with  
minor API changes every 6 to 8 months, especially if we are more  
proactive about communicating them to java-user@ and in CHANGES.  In  
fact, if we announced changes that are going to break for not the next  
version, but the one after, it would give people lots of time to adapt.

>
>
>> Last, and mostly I mention it as an afterthought.  How are you  
>> going to
>> handle changes to the Settings?  Say, for instance, we come out w/
>> Settings2.4, release it and then we realize we missed something  
>> (and this
>> seems likely given the number of settings available in Lucene), then
>> what?
>>
>> We deprecate Settings2.4 and come out with TheRealSettingsFor2.4?   
>> And then
>> when that is incomplete?
>
> Well, in 2.9 there would still be a Settings2.4 class, but it'd have
> newly created (in 2.9) settings with their defaults bound.
>
> So in 2.9, when sorting by field you can optionally turn off scoring.
> It gives a sizable performance boost doing so.  We of course were
> forced to leave scoring on for back compat, but if we had this
> Settings class online what we would have done instead is add a new
> "search.sort.trackScores" (and, "trackMaxScore") setting to the base
> Settings class, but the Settings2.4 would bind it to true.
>
> There should be no need to make a new class for 2.4's settings on
> releasing 2.9?

I think you missed the point.  The problem lies in releasing 2.4's  
settings and those settings are wrong.  Using your example, say  
Settings24 was messed up and set trackMaxScore to true when it should  
have been false (mistakes happen).  It gets released in 2.9 as the  
settings for 2.4 back compatibility.  We then realize our mistake.   
How do you fix it?  You can't just set it to false, b/c now you have  
users who are depending, potentially, on the _wrong_ version.  So, now  
you have to deprecate it and come out with a "new" Settings2.4 called  
something else.

>
>
>> I still think we would benefit from just communicating upcoming  
>> changes
>> better even in minor releases, thereby allowing for a bit more  
>> variance in
>> back compat.  It should be the exception, not the rule.
>
> I like DM's point, that this Settings class would be a great vehicle
> for exactly that communication.  Rather than pouring over a
> CHANGES.txt, you can see setting-by-setting what changed, and why.

Sorry, I'd rather read CHANGES.  It is the one place we all make sure  
to enter our changes.  People aren't as good about javadocs,  
especially accessors where the name is "self explanatory".  Plus it  
has a link to a JIRA issue.

Also, how useful is it going to be to have 30 or 40 (hundreds?)  
accessors on a single Settings object?  So, then, the logical thing to  
do is to split it up and have some nested way of doing things.  And  
then people will be tired of having to programmatically set all the  
values, so they will create a config/properties file that does it.   
But, because we don't like dependencies, we will re-invent how that  
works.  After it's all said and done, you end up having re-invented IOC.

Another interesting thing to think about is how do we sunset old  
settings objects.  When we are on 4.X, should we still keep around 2.4  
settings?  Not really something we necessarily need to solve right now.

-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Tue, May 19, 2009 at 7:26 AM, Grant Ingersoll <gs...@apache.org> wrote:

> I don't think we have said that bug fixes are required to be back
> compatible, even if it is in analysis.  I think it is a really bad idea for
> TokenStreams to have if clauses in them checking boolean values for old
> versus new behaviors.
>
> When they can be back compat, we do, but there is not a requirement.  For
> instance, we upgraded Snowball.

True (Snowball), but then we have discussions like this:

  https://issues.apache.org/jira/browse/LUCENE-1068?focusedCommentId=12550948&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12550948

which added a confusing deprecated "boolean replaceDepAcronym =
false;" to StandardAnalyzer.  Something similar led to
StandardAnalyzer.replaceInvalidAcronym.

I think there have been other cases (in particular StandardAnalyzer,
QueryParser) over time, but I haven't tracked them down.  Analyzer
back compat after fixing issues is especially tricky since the bugs
get "cached" into the index and queries against that index using the
fixed analyzer may not longer match the docs.  (So I think back-compat
is important in Analyzers).

> Or, the removal of StopFilter as "Standard" all together.  This coupled with
> a QP that created phrases around stop words is a better solution.

Interesting... that'd be a pretty big change to StandardAnalyzer,
though.

I can see we are spinning off lots of neat ideas, decoupled from the
"Settings" proposal, here :)

> For instance, if we removed the StopFilter from the StandardAnalyzer, then
> what?  A Settings object would not be able to account for it.

Why not?  The settings object could have say a property
"analysis.standard.enableStopFilter"?

> Likewise, the subtler issue of "fixing" a TokenStream such that it
> might produce different tokens.

Settings should cover this in general, I think.

> I really worry about Settings objects having to be repeatedly checked inside
> of tight inner loops.  Even looking at the new TokenStream stuff, there are
> now checks for the "new API" in an area that is called _a lot_ of times.

Agreed, but I'd say this is orthogonal.  We should never do slow
things inside inner loops -- checking settings, calling logging
frameworks, calling List.size(), opening files, etc.  This is the
stuff of standard coding practices...

> Last, and mostly I mention it as an afterthought.  How are you going to
> handle changes to the Settings?  Say, for instance, we come out w/
> Settings2.4, release it and then we realize we missed something (and this
> seems likely given the number of settings available in Lucene), then
> what?
>
> We deprecate Settings2.4 and come out with TheRealSettingsFor2.4?  And then
> when that is incomplete?

Well, in 2.9 there would still be a Settings2.4 class, but it'd have
newly created (in 2.9) settings with their defaults bound.

So in 2.9, when sorting by field you can optionally turn off scoring.
It gives a sizable performance boost doing so.  We of course were
forced to leave scoring on for back compat, but if we had this
Settings class online what we would have done instead is add a new
"search.sort.trackScores" (and, "trackMaxScore") setting to the base
Settings class, but the Settings2.4 would bind it to true.

There should be no need to make a new class for 2.4's settings on
releasing 2.9?

> I still think we would benefit from just communicating upcoming changes
> better even in minor releases, thereby allowing for a bit more variance in
> back compat.  It should be the exception, not the rule.

I like DM's point, that this Settings class would be a great vehicle
for exactly that communication.  Rather than pouring over a
CHANGES.txt, you can see setting-by-setting what changed, and why.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Grant Ingersoll <gs...@apache.org>.

I like the idea, some thoughts below.

On May 18, 2009, at 5:06 PM, Michael McCandless wrote:

> As we all know, Lucene's back-compat policy necessarily hurts the
> out-of-the-box experience for new users: because we are only allowed
> make substantial improvements to Lucene's default settings at a major
> release, new users won't see the improvements to our settings until a
> major release (typically years apart).
>
> Lucene has a number of default settings, eg some recent examples:
>
>  * Read-only IndexReader gives better much performance with threads,
>    yet we must now default IndexReader.open to return a non-readOnly
>    reader
>
>  * We can now optionally turn off scoring when sorting by field
>    (sizable speed gain), but we had to leave it on by default until
>    3.0
>
>  * Letting IndexReader.norms return null
>
>  * LogMergePolicy now takes deletions into account, but we had to
>    disable it by default, since it could conceivably break back
>    compat.
>
>  * Bug fixes in StandardAnalyzer must be delayed until 3.0 since
>    there's a remote chance they'd break back compat in an app, or we
>    end up adding confusing methods like "public static void
>    setDefaultReplaceInvalidAcronym".

I don't think we have said that bug fixes are required to be back  
compatible, even if it is in analysis.  I think it is a really bad  
idea for TokenStreams to have if clauses in them checking boolean  
values for old versus new behaviors.

When they can be back compat, we do, but there is not a requirement.   
For instance, we upgraded Snowball.

>
>
>  * NIOFSDirectory ought to be "the default" on UNIX, but it's not
>
>  * Constant score rewrite ought to be the default for most multi-term
>    queries
>
>  * StopFilter should enable position increments by default
>

Or, the removal of StopFilter as "Standard" all together.  This  
coupled with a QP that created phrases around stop words is a better  
solution.

> The fact that we are "forced" delay such "out of the box" improvements
> to Lucene for so long is a frustrating cost, since it can only stunt
> Lucene's adoption and growth and my sense is that it's a minority of
> Lucene's users that need such strict back-compat (this has been
> discussed before).  It also clutters our APIs because we end up
> creating setter/getters that often only exist for the sake of a back
> compat preservation of a bug.
>
> I think we can fix this.  Ie, maintain our strong back-compat policy,
> yet still allow new users to experience the best of Lucene on every
> release (not just on major releases), by creating an explicit class
> that holds settings/defaults used by Lucene.
>
> For example, say we create a base class named Settings.  It holds the
> defaults for settings across all of Lucene's classes. When you create
> IndexReader, IndexWriter and others, you must pass in a Settings
> instance.
>
> A subclass, SettingsMatching24, binds all settings to "match" 2.4's
> behavior.  When we make improvements in 2.9, we'd add the back-compat
> settings to SettingsMatching24.  So if your app wants to keep exactly
> 2.4's behavior, you'd pass in SettingsMatching24().  On upgrading to
> 2.9 you'd still see 2.4's behavior.
>
> Users who'd like to see Lucene's improvements on each minor release
> would instead instantiate LatestAndGreatestSettings() (or
> CurrentVersionSettings(), or something), understanding that when they
> upgrade there might be biggish changes to Lucene's defaults.  My guess
> is most users would use this settings class.
>
> Doug actually suggested this exact idea a while back:
>
>  http://www.gossamer-threads.com/lists/lucene/java-dev/54421#54421.
>
> Now that I realize we could use this to strongly decouple "users
> wanting precise back-compat" from "users wanting the latest &
> greatest", I think it's a very compelling solution.
>
> If we do this I'd like to do it in 2.9, so that starting with 3.x we
> are free to change default settings w/o breaking back compat.
>
> Thoughts?

For instance, if we removed the StopFilter from the StandardAnalyzer,  
then what?  A Settings object would not be able to account for it.    
Likewise, the subtler issue of "fixing" a TokenStream such that it  
might produce different tokens.

I really worry about Settings objects having to be repeatedly checked  
inside of tight inner loops.  Even looking at the new TokenStream  
stuff, there are now checks for the "new API" in an area that is  
called _a lot_ of times.

Last, and mostly I mention it as an afterthought.  How are you going  
to handle changes to the Settings?  Say, for instance, we come out w/  
Settings2.4, release it and then we realize we missed something (and  
this seems likely given the number of settings available in Lucene),  
then what?  We deprecate Settings2.4 and come out with  
TheRealSettingsFor2.4?  And then when that is incomplete?

I still think we would benefit from just communicating upcoming  
changes better even in minor releases, thereby allowing for a bit more  
variance in back compat.  It should be the exception, not the rule.

Still, I think this is worth pursuing.

-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Tue, May 19, 2009 at 4:34 AM, mark harwood <ma...@yahoo.co.uk> wrote:
>
>>When you create IndexReader, IndexWriter and others, you must pass in a Settings
>> instance.
>
> I think this would also help solve the steady growth of constructor variations (18 in 2.4's IndexWriter vs 3 in Lucene 1.9).

Right.  So for example the transition of IndexWriter from
autoCommit=true to autoCommit=false would have been quite a bit
cleaner if we had *Settings classes.  We would have left
SettingsMatching23 with autoCommit=true, and
SettingsMatching24/CurrentVersionSettings would set autoCommit=false,
without doubling IndexWriter's ctors.

Though we'd need clear guidelines on things that become settings vs
things that remain args to ctors, or set/gets.  Should
IndexDeletionPolicy be a setting?  (I think so?  It's shared b/w
IndexWriter & IndexReader doing "write" ops).  Maybe
MergePolicy/Scheduler should be a setting, so we have freedom to
improve the default with time.

The Analyzer instance?  The Similarity instance (which is used both
during indexing & searching)?

On IndexWriter's MaxFieldLength I'm torn on -- it was graduated to a
ctor arg explicitly so you're forced to choose to have your fields
truncated or not (since it was a common hidden trap).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by mark harwood <ma...@yahoo.co.uk>.

>When you create IndexReader, IndexWriter and others, you must pass in a Settings
> instance.

I think this would also help solve the steady growth of constructor variations (18 in 2.4's IndexWriter vs 3 in Lucene 1.9).






----- Original Message ----
From: Otis Gospodnetic <ot...@yahoo.com>
To: java-dev@lucene.apache.org
Sent: Tuesday, 19 May, 2009 2:43:08
Subject: Re: Lucene's default settings & back compatibility


Me like!

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Michael McCandless <lu...@mikemccandless.com>
> To: java-dev@lucene.apache.org
> Sent: Monday, May 18, 2009 5:06:39 PM
> Subject: Lucene's default settings & back compatibility
> 
> As we all know, Lucene's back-compat policy necessarily hurts the
> out-of-the-box experience for new users: because we are only allowed
> make substantial improvements to Lucene's default settings at a major
> release, new users won't see the improvements to our settings until a
> major release (typically years apart).
> 
> Lucene has a number of default settings, eg some recent examples:
> 
>   * Read-only IndexReader gives better much performance with threads,
>     yet we must now default IndexReader.open to return a non-readOnly
>     reader
> 
>   * We can now optionally turn off scoring when sorting by field
>     (sizable speed gain), but we had to leave it on by default until
>     3.0
> 
>   * Letting IndexReader.norms return null
> 
>   * LogMergePolicy now takes deletions into account, but we had to
>     disable it by default, since it could conceivably break back
>     compat.
> 
>   * Bug fixes in StandardAnalyzer must be delayed until 3.0 since
>     there's a remote chance they'd break back compat in an app, or we
>     end up adding confusing methods like "public static void
>     setDefaultReplaceInvalidAcronym".
> 
>   * NIOFSDirectory ought to be "the default" on UNIX, but it's not
> 
>   * Constant score rewrite ought to be the default for most multi-term
>     queries
> 
>   * StopFilter should enable position increments by default
> 
> The fact that we are "forced" delay such "out of the box" improvements
> to Lucene for so long is a frustrating cost, since it can only stunt
> Lucene's adoption and growth and my sense is that it's a minority of
> Lucene's users that need such strict back-compat (this has been
> discussed before).  It also clutters our APIs because we end up
> creating setter/getters that often only exist for the sake of a back
> compat preservation of a bug.
> 
> I think we can fix this.  Ie, maintain our strong back-compat policy,
> yet still allow new users to experience the best of Lucene on every
> release (not just on major releases), by creating an explicit class
> that holds settings/defaults used by Lucene.
> 
> For example, say we create a base class named Settings.  It holds the
> defaults for settings across all of Lucene's classes. When you create
> IndexReader, IndexWriter and others, you must pass in a Settings
> instance.
> 
> A subclass, SettingsMatching24, binds all settings to "match" 2.4's
> behavior.  When we make improvements in 2.9, we'd add the back-compat
> settings to SettingsMatching24.  So if your app wants to keep exactly
> 2.4's behavior, you'd pass in SettingsMatching24().  On upgrading to
> 2.9 you'd still see 2.4's behavior.
> 
> Users who'd like to see Lucene's improvements on each minor release
> would instead instantiate LatestAndGreatestSettings() (or
> CurrentVersionSettings(), or something), understanding that when they
> upgrade there might be biggish changes to Lucene's defaults.  My guess
> is most users would use this settings class.
> 
> Doug actually suggested this exact idea a while back:
> 
>  http://www.gossamer-threads.com/lists/lucene/java-dev/54421#54421.
> 
> Now that I realize we could use this to strongly decouple "users
> wanting precise back-compat" from "users wanting the latest &
> greatest", I think it's a very compelling solution.
> 
> If we do this I'd like to do it in 2.9, so that starting with 3.x we
> are free to change default settings w/o breaking back compat.
> 
> Thoughts?
> 
> Mike
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Me like!

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Michael McCandless <lu...@mikemccandless.com>
> To: java-dev@lucene.apache.org
> Sent: Monday, May 18, 2009 5:06:39 PM
> Subject: Lucene's default settings & back compatibility
> 
> As we all know, Lucene's back-compat policy necessarily hurts the
> out-of-the-box experience for new users: because we are only allowed
> make substantial improvements to Lucene's default settings at a major
> release, new users won't see the improvements to our settings until a
> major release (typically years apart).
> 
> Lucene has a number of default settings, eg some recent examples:
> 
>   * Read-only IndexReader gives better much performance with threads,
>     yet we must now default IndexReader.open to return a non-readOnly
>     reader
> 
>   * We can now optionally turn off scoring when sorting by field
>     (sizable speed gain), but we had to leave it on by default until
>     3.0
> 
>   * Letting IndexReader.norms return null
> 
>   * LogMergePolicy now takes deletions into account, but we had to
>     disable it by default, since it could conceivably break back
>     compat.
> 
>   * Bug fixes in StandardAnalyzer must be delayed until 3.0 since
>     there's a remote chance they'd break back compat in an app, or we
>     end up adding confusing methods like "public static void
>     setDefaultReplaceInvalidAcronym".
> 
>   * NIOFSDirectory ought to be "the default" on UNIX, but it's not
> 
>   * Constant score rewrite ought to be the default for most multi-term
>     queries
> 
>   * StopFilter should enable position increments by default
> 
> The fact that we are "forced" delay such "out of the box" improvements
> to Lucene for so long is a frustrating cost, since it can only stunt
> Lucene's adoption and growth and my sense is that it's a minority of
> Lucene's users that need such strict back-compat (this has been
> discussed before).  It also clutters our APIs because we end up
> creating setter/getters that often only exist for the sake of a back
> compat preservation of a bug.
> 
> I think we can fix this.  Ie, maintain our strong back-compat policy,
> yet still allow new users to experience the best of Lucene on every
> release (not just on major releases), by creating an explicit class
> that holds settings/defaults used by Lucene.
> 
> For example, say we create a base class named Settings.  It holds the
> defaults for settings across all of Lucene's classes. When you create
> IndexReader, IndexWriter and others, you must pass in a Settings
> instance.
> 
> A subclass, SettingsMatching24, binds all settings to "match" 2.4's
> behavior.  When we make improvements in 2.9, we'd add the back-compat
> settings to SettingsMatching24.  So if your app wants to keep exactly
> 2.4's behavior, you'd pass in SettingsMatching24().  On upgrading to
> 2.9 you'd still see 2.4's behavior.
> 
> Users who'd like to see Lucene's improvements on each minor release
> would instead instantiate LatestAndGreatestSettings() (or
> CurrentVersionSettings(), or something), understanding that when they
> upgrade there might be biggish changes to Lucene's defaults.  My guess
> is most users would use this settings class.
> 
> Doug actually suggested this exact idea a while back:
> 
>   http://www.gossamer-threads.com/lists/lucene/java-dev/54421#54421.
> 
> Now that I realize we could use this to strongly decouple "users
> wanting precise back-compat" from "users wanting the latest &
> greatest", I think it's a very compelling solution.
> 
> If we do this I'd like to do it in 2.9, so that starting with 3.x we
> are free to change default settings w/o breaking back compat.
> 
> Thoughts?
> 
> Mike
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by DM Smith <dm...@gmail.com>.

On May 19, 2009, at 7:45 AM, Michael McCandless wrote:

> On Tue, May 19, 2009 at 6:47 AM, DM Smith <dm...@gmail.com>  
> wrote:
>
>> It is common in my application, a Bible program, that indexes each  
>> verse
>> (think of a verse as a numbered sentence) as a separate document.  
>> We index
>> everything, including words that are typically stop words as those  
>> might be
>> important to our end users. Besides this, the top 280 word roots  
>> represent
>> 90% of the occurrences.
>> And on searches, we return everything in book order, unless the  
>> user wants
>> to score the result. In that case, we return a small, user  
>> configurable
>> amount of hits ordered by score.
>
> The ability to turn off scoring when sorting by field, new in 2.9,
> should be a good performance boost for your use case (if performance
> is important).
>
>> And we are using Lucene out of the box for the most part. We've  
>> deviated
>> only to incrementally solve performance problems.
>
> Right, my impression is most people will stick w/ Lucene's defaults,
> incrementally changing only limited settings they come across, which
> is why selecting good defaults is vital to Lucene's growth/adoption
> (new users especially simply start w/ our defaults).
>
> But we can't pick good defaults when we're so heavily bound by back- 
> compat.
>
> Which is why I find the Settings approach so appealing :)  Suddenly,
> on all improvements to Lucene, we have the freedom to change our
> defaults so a new user sees all such improvements.

 From my perspective as a user:
Backward compatibility is important, but it is not a be-all and end-all.

To me, if I can drop in the new jar and get bug fixes that's great. My  
expectation is that searches against an existing index will still  
return the same or, in the case of bug fixes, better results.

What I need to know is when that is not the case. Today, we use a  
naming convention of the Lucene jars to indicate whether that is true.  
I'd be just as happy if there were a compatibility level that I could  
check (I'm having to do that in our code as I change our analyzers  
frequently enough to be embarrassed).

The problem, which might be addressed in the "fixing" of core vs  
contrib, is that we use lots of contrib (analyzers, snowball,  
highlighting) and want it to maintain backward compatibility too. (I'm  
happy that has been the case!) So, perhaps a compatibility level per  
contribution.

The packagers for jpackage consider nearly every release of Lucene to  
break backward compatibility, because they treat Lucene as a whole.  
Perhaps that is the same with other Linux distributions. But because  
backward compatibility does not apply to contrib in a strict fashion,  
one cannot reliably use Lucene from distributions unless such a policy  
is the case.

In any case, I don't think anyone should just drop in a new jar  
without some testing. At a minimum, they should compile with  
deprecations turned on.

Regarding deprecations, I'd also be just as happy if a method was marked
	@deprecated This behavior <b>has</b> changed in with this release,  
2.4.3.
That is, as a warning of changed behavior.

And then on the 3.0 release the warning could be removed.

But then again, my use of Lucene, while very important to my  
application, is very simple and easy to change.

-- DM

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Tue, May 19, 2009 at 6:47 AM, DM Smith <dm...@gmail.com> wrote:

> It is common in my application, a Bible program, that indexes each verse
> (think of a verse as a numbered sentence) as a separate document. We index
> everything, including words that are typically stop words as those might be
> important to our end users. Besides this, the top 280 word roots represent
> 90% of the occurrences.
> And on searches, we return everything in book order, unless the user wants
> to score the result. In that case, we return a small, user configurable
> amount of hits ordered by score.

The ability to turn off scoring when sorting by field, new in 2.9,
should be a good performance boost for your use case (if performance
is important).

> And we are using Lucene out of the box for the most part. We've deviated
> only to incrementally solve performance problems.

Right, my impression is most people will stick w/ Lucene's defaults,
incrementally changing only limited settings they come across, which
is why selecting good defaults is vital to Lucene's growth/adoption
(new users especially simply start w/ our defaults).

But we can't pick good defaults when we're so heavily bound by back-compat.

Which is why I find the Settings approach so appealing :)  Suddenly,
on all improvements to Lucene, we have the freedom to change our
defaults so a new user sees all such improvements.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by DM Smith <dm...@gmail.com>.

On May 18, 2009, at 11:31 PM, Robert Muir wrote:

> I am curious about this, do you think its a better default because  
> it avoids the max boolean clauses problem? or because for a lot of  
> these scoring doesn't make much sense anyway?
>
> I ran tests on a pretty big index, you pay a price for the constant  
> score/filter method. Its slower for the common case searches, it  
> only starts to win for queries that return > 10% or so the index,  
> but its significantly slower for narrow queries...
>
> I'm just trying to imagine a case where queries that return > 10% or  
> so of the index are actually the common/default...?

It is common in my application, a Bible program, that indexes each  
verse (think of a verse as a numbered sentence) as a separate  
document. We index everything, including words that are typically stop  
words as those might be important to our end users. Besides this, the  
top 280 word roots represent 90% of the occurrences.

And on searches, we return everything in book order, unless the user  
wants to score the result. In that case, we return a small, user  
configurable amount of hits ordered by score.

And we are using Lucene out of the box for the most part. We've  
deviated only to incrementally solve performance problems.

>
>
>
>  * Constant score rewrite ought to be the default for most multi-term
>    queries
>
>
>
>
> -- 
> Robert Muir
> rcmuir@gmail.com

Re: Lucene's default settings & back compatibility

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Tue, May 19, 2009 at 8:50 AM, Robert Muir <rc...@gmail.com> wrote:
> in my tests the problem seemed to boil down to iteration of a sparse
> openbitset... so maybe the filter approach is still an option but when #
> docs is small some other doc id set impl is used?

Directly using the BooleanQuery skips any intermediate step of filter
creation, so this should be fastest when the number of terms is small.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Tue, May 19, 2009 at 2:29 PM, Shai Erera <se...@gmail.com> wrote:
> Is this the time and place to re-raise a previous discussion about moving
> SweetSpotSimilarity to core and move to use it?

SweetSpotSimilarity wouldn't make a good default.  It's a flat topped
hill that falls suddenly off on either side.  Short documents and long
documents are seen as less relevant.  That requires careful tuning and
will only work for certain fields and collections.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Shai Erera <se...@gmail.com>.

Is this the time and place to re-raise a previous discussion about moving
SweetSpotSimilarity to core and move to use it?

On Tue, May 19, 2009 at 8:54 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Tue, May 19, 2009 at 9:15 AM, Robert Muir <rc...@gmail.com> wrote:
> > none of my queries are "wicked fast" on 100M doc index!
>
> OK.
>
> > for narrow queries, we are talking about ~100ms queries becoming ~400ms
> or
> > so with the constant score rewrite...
> > for wide queries, we are talking about maybe 3 or 4s queries becoming 2s
> or
> > so with the constant score rewrite..., it depends on how wide the query
> > is...
> >
> > I agree with fixing the "wicked slow" queries, but currently I think the
> > general case would lose pretty bad (the way it works now), and only a
> corner
> > case wins.
>
> I opened LUCENE-1644 for this; I think if we can do constant score
> BooleanQuery, then MultiTermQuery is free to pick & choose the best
> way to run the query.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Tue, May 19, 2009 at 9:15 AM, Robert Muir <rc...@gmail.com> wrote:
> none of my queries are "wicked fast" on 100M doc index!

OK.

> for narrow queries, we are talking about ~100ms queries becoming ~400ms or
> so with the constant score rewrite...
> for wide queries, we are talking about maybe 3 or 4s queries becoming 2s or
> so with the constant score rewrite..., it depends on how wide the query
> is...
>
> I agree with fixing the "wicked slow" queries, but currently I think the
> general case would lose pretty bad (the way it works now), and only a corner
> case wins.

I opened LUCENE-1644 for this; I think if we can do constant score
BooleanQuery, then MultiTermQuery is free to pick & choose the best
way to run the query.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Robert Muir <rc...@gmail.com>.

none of my queries are "wicked fast" on 100M doc index!

for narrow queries, we are talking about ~100ms queries becoming ~400ms or
so with the constant score rewrite...
for wide queries, we are talking about maybe 3 or 4s queries becoming 2s or
so with the constant score rewrite..., it depends on how wide the query
is...

I agree with fixing the "wicked slow" queries, but currently I think the
general case would lose pretty bad (the way it works now), and only a corner
case wins.

On Tue, May 19, 2009 at 9:02 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Tue, May 19, 2009 at 8:50 AM, Robert Muir <rc...@gmail.com> wrote:
> > in my tests the problem seemed to boil down to iteration of a sparse
> > openbitset... so maybe the filter approach is still an option but when #
> > docs is small some other doc id set impl is used?
>
> Interesting... was your test a case where wicked fast queries became
> only somewhat fast?  Or did you actually see slowish queries get much
> slower?
>
> In general, I'm less concerned about the former than the latter... I
> think it's the wicked slow queries in Lucene that we need to focus on.
>
> Also, LUCENE-1536 (appply filters via random access API) should
> independently address this, as well as filters-as-BooleanClause.
>
> But I'll include this in the issue; eg, I think MultiTermQuery could
> choose sparse vs dense bit set impl
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Robert Muir
rcmuir@gmail.com

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Tue, May 19, 2009 at 8:50 AM, Robert Muir <rc...@gmail.com> wrote:
> in my tests the problem seemed to boil down to iteration of a sparse
> openbitset... so maybe the filter approach is still an option but when #
> docs is small some other doc id set impl is used?

Interesting... was your test a case where wicked fast queries became
only somewhat fast?  Or did you actually see slowish queries get much
slower?

In general, I'm less concerned about the former than the latter... I
think it's the wicked slow queries in Lucene that we need to focus on.

Also, LUCENE-1536 (appply filters via random access API) should
independently address this, as well as filters-as-BooleanClause.

But I'll include this in the issue; eg, I think MultiTermQuery could
choose sparse vs dense bit set impl

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Robert Muir <rc...@gmail.com>.

in my tests the problem seemed to boil down to iteration of a sparse
openbitset... so maybe the filter approach is still an option but when #
docs is small some other doc id set impl is used?

On Tue, May 19, 2009 at 8:28 AM, Mark Miller <ma...@gmail.com> wrote:

> Michael McCandless wrote:
>
>> On Mon, May 18, 2009 at 11:31 PM, Robert Muir <rc...@gmail.com> wrote:
>>
>>
>>> I am curious about this, do you think its a better default because it
>>> avoids
>>> the max boolean clauses problem? or because for a lot of these scoring
>>> doesn't make much sense anyway?
>>>
>>>
>>
>> I think you're referring to constant score mode default, for
>> MultiTermQuery & QueryParser, right?
>>
>>
>>
>>> I ran tests on a pretty big index, you pay a price for the constant
>>> score/filter method. Its slower for the common case searches, it only
>>> starts
>>> to win for queries that return > 10% or so the index, but its
>>> significantly
>>> slower for narrow queries...
>>>
>>> I'm just trying to imagine a case where queries that return > 10% or so
>>> of
>>> the index are actually the common/default...?
>>>
>>>
>>
>> Excellent points!  And this also makes clear why healthy discussion on
>> each default is important, as well as how good it'd be to have
>> Settings online so that we are free to even have such discussions
>> (vs being bound by back-compat which prevents any improvements
>> to the defaults).
>>
>> I was actually referring to the fact that scores for MultiTermQuery
>> rewritten to BooleanQuery are often meaningless to the app (I
>> think?).  But you're right the performance cost of the "make a filter
>> up front" approach is too high for smallish queries.
>>
>> Thinking more on this... I'd love to have a constant-score mode, but
>> implemented as a BooleanQuery, meaning the scores would be the same
>> (constant) regardless of whether under-the-hood the query was
>> rewritten to BooleanQuery vs pre-compiled up front into a BitSet.
>>
>> This would then decouple scoring from rewrite method, which in turn
>> would give us the freedom to pick and choose the fastest impl based on
>> particulars of the query.
>>
>> So if we had such a ConstantScoreBooleanQuery, and we fixed
>> MultiTermQuery to conditionally use that, then I think we'd want
>> MultiTermQuery to do constant scoring by default.  (And, it'd then be
>> free pick whether "create filter up front" or "use
>> ConstantScoreBooleanQuery" was most performant, query by query).
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
> +1. ConstantScoreQuery is only a performance win when there are lots of
> matches (it seems), but the lack of TooManyClauses exceptions is also a big
> win. I want the best of both worlds :)
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Robert Muir
rcmuir@gmail.com

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Tue, May 19, 2009 at 8:28 AM, Mark Miller <ma...@gmail.com> wrote:

>> Thinking more on this... I'd love to have a constant-score mode, but
>> implemented as a BooleanQuery, meaning the scores would be the same
>> (constant) regardless of whether under-the-hood the query was
>> rewritten to BooleanQuery vs pre-compiled up front into a BitSet.
>
> +1. ConstantScoreQuery is only a performance win when there are lots of
> matches (it seems), but the lack of TooManyClauses exceptions is also a big
> win. I want the best of both worlds :)

OK I'll open an issue to give BooleanQuery a constant score mode, and
fix MultiTermQuery to use it so that "constant scoring" and "use
up-front filter vs BooleanQuery" are nearly orthogonal decisions.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Mark Miller <ma...@gmail.com>.

Michael McCandless wrote:
> On Mon, May 18, 2009 at 11:31 PM, Robert Muir <rc...@gmail.com> wrote:
>   
>> I am curious about this, do you think its a better default because it avoids
>> the max boolean clauses problem? or because for a lot of these scoring
>> doesn't make much sense anyway?
>>     
>
> I think you're referring to constant score mode default, for
> MultiTermQuery & QueryParser, right?
>
>   
>> I ran tests on a pretty big index, you pay a price for the constant
>> score/filter method. Its slower for the common case searches, it only starts
>> to win for queries that return > 10% or so the index, but its significantly
>> slower for narrow queries...
>>
>> I'm just trying to imagine a case where queries that return > 10% or so of
>> the index are actually the common/default...?
>>     
>
> Excellent points!  And this also makes clear why healthy discussion on
> each default is important, as well as how good it'd be to have
> Settings online so that we are free to even have such discussions
> (vs being bound by back-compat which prevents any improvements
> to the defaults).
>
> I was actually referring to the fact that scores for MultiTermQuery
> rewritten to BooleanQuery are often meaningless to the app (I
> think?).  But you're right the performance cost of the "make a filter
> up front" approach is too high for smallish queries.
>
> Thinking more on this... I'd love to have a constant-score mode, but
> implemented as a BooleanQuery, meaning the scores would be the same
> (constant) regardless of whether under-the-hood the query was
> rewritten to BooleanQuery vs pre-compiled up front into a BitSet.
>
> This would then decouple scoring from rewrite method, which in turn
> would give us the freedom to pick and choose the fastest impl based on
> particulars of the query.
>
> So if we had such a ConstantScoreBooleanQuery, and we fixed
> MultiTermQuery to conditionally use that, then I think we'd want
> MultiTermQuery to do constant scoring by default.  (And, it'd then be
> free pick whether "create filter up front" or "use
> ConstantScoreBooleanQuery" was most performant, query by query).
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>   
+1. ConstantScoreQuery is only a performance win when there are lots of 
matches (it seems), but the lack of TooManyClauses exceptions is also a 
big win. I want the best of both worlds :)

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Mon, May 18, 2009 at 11:31 PM, Robert Muir <rc...@gmail.com> wrote:
> I am curious about this, do you think its a better default because it avoids
> the max boolean clauses problem? or because for a lot of these scoring
> doesn't make much sense anyway?

I think you're referring to constant score mode default, for
MultiTermQuery & QueryParser, right?

> I ran tests on a pretty big index, you pay a price for the constant
> score/filter method. Its slower for the common case searches, it only starts
> to win for queries that return > 10% or so the index, but its significantly
> slower for narrow queries...
>
> I'm just trying to imagine a case where queries that return > 10% or so of
> the index are actually the common/default...?

Excellent points!  And this also makes clear why healthy discussion on
each default is important, as well as how good it'd be to have
Settings online so that we are free to even have such discussions
(vs being bound by back-compat which prevents any improvements
to the defaults).

I was actually referring to the fact that scores for MultiTermQuery
rewritten to BooleanQuery are often meaningless to the app (I
think?).  But you're right the performance cost of the "make a filter
up front" approach is too high for smallish queries.

Thinking more on this... I'd love to have a constant-score mode, but
implemented as a BooleanQuery, meaning the scores would be the same
(constant) regardless of whether under-the-hood the query was
rewritten to BooleanQuery vs pre-compiled up front into a BitSet.

This would then decouple scoring from rewrite method, which in turn
would give us the freedom to pick and choose the fastest impl based on
particulars of the query.

So if we had such a ConstantScoreBooleanQuery, and we fixed
MultiTermQuery to conditionally use that, then I think we'd want
MultiTermQuery to do constant scoring by default.  (And, it'd then be
free pick whether "create filter up front" or "use
ConstantScoreBooleanQuery" was most performant, query by query).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Robert Muir <rc...@gmail.com>.

I am curious about this, do you think its a better default because it avoids
the max boolean clauses problem? or because for a lot of these scoring
doesn't make much sense anyway?

I ran tests on a pretty big index, you pay a price for the constant
score/filter method. Its slower for the common case searches, it only starts
to win for queries that return > 10% or so the index, but its significantly
slower for narrow queries...

I'm just trying to imagine a case where queries that return > 10% or so of
the index are actually the common/default...?


>
>  * Constant score rewrite ought to be the default for most multi-term
>    queries
>
>


-- 
Robert Muir
rcmuir@gmail.com

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Tue, May 19, 2009 at 6:56 AM, DM Smith <dm...@gmail.com> wrote:

> I really like the idea of a settings class. Another benefit, *especially if
> it is documented well*, user's would be led to tuning parameters.
>
> In this settings class, would there be setters/getters so that one could
> take particular defaults and tweak them? E.g. I like one default from 2.4
> but will take everything else from 3.0. Therefore, I use the 3.0 defaults,
> but change one of the settings to match 2.4, as in:
>
> LuceneSettings myDefaults = LuceneSettings.defaults3_0();
> myDefaults.setXXX(LuceneSettings.defaults2_4().getXXX());
> LuceneSettings.useDefaults(myDefaults);

Exactly -- one could start from SettingsMatchin24 and then cherry-pick
the 2.9 improvements a-la-cart.

And I agree on documentation: the settings should make it clear what
they do, what the default was in 2.4, why the default was changed in
2.9 (what the benefit/tradeoffs were), etc.

But we need to get this in for 2.9, so that starting w/ 3.x we are free.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by DM Smith <dm...@gmail.com>.

On May 19, 2009, at 6:39 AM, Michael McCandless wrote:

> On Mon, May 18, 2009 at 8:51 PM, Yonik Seeley
> <yo...@lucidimagination.com> wrote:
>> On Mon, May 18, 2009 at 5:06 PM, Michael McCandless
>> <lu...@mikemccandless.com> wrote:
>>> * StopFilter should enable position increments by default
>>
>> Is this one an actual improvement in the general case?
>> A query of "foo bar" then wouldn't match a document with "foo and
>> bar", but a query of "foo the bar" would.
>
> Well... I think I'd argue that this is an improvement, ie the query
> "foo bar" should not in fact match a doc with "foo and bar" (unless
> your PhraseQuery is using slop).  If you really want slop in your
> matching, you should just use slop.
>
> Query "foo the bar" will match document "foo and bar" in either case,
> so it's non-differentiating here.
>
> Also, it's bothersome that by default StopFilter throws away more
> information than it needs to.  Ie, it's already discarding words
> (that's its purpose) but the fact that it then also discards the holes
> left behind, by default, is not good, I think.
>
> I went and re-read http://issues.apache.org/jira/browse/LUCENE-1095.
> Since both QueryParser and StopFilter can now preserve position
> increments, I'd think we would want to change both to do so (in the
> *Settings classes)?
>
> (And, QueryParser is another great example where a *Settings class
> would give us much more freedom to fix its quirks w/o breaking back
> compat.)
>
> Anyway, this is a great debate, in that any defaults set in Lucene
> over time should be scrutinized, through discussions like this, rather
> than simply always forcefully left on their back-compat defaults.  The
> Settings class would give us this freedom.

I really like the idea of a settings class. Another benefit,  
*especially if it is documented well*, user's would be led to tuning  
parameters.

In this settings class, would there be setters/getters so that one  
could take particular defaults and tweak them? E.g. I like one default  
from 2.4 but will take everything else from 3.0. Therefore, I use the  
3.0 defaults, but change one of the settings to match 2.4, as in:

LuceneSettings myDefaults = LuceneSettings.defaults3_0();
myDefaults.setXXX(LuceneSettings.defaults2_4().getXXX());
LuceneSettings.useDefaults(myDefaults);


-- DM


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Mon, May 18, 2009 at 8:51 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Mon, May 18, 2009 at 5:06 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>>  * StopFilter should enable position increments by default
>
> Is this one an actual improvement in the general case?
> A query of "foo bar" then wouldn't match a document with "foo and
> bar", but a query of "foo the bar" would.

Well... I think I'd argue that this is an improvement, ie the query
"foo bar" should not in fact match a doc with "foo and bar" (unless
your PhraseQuery is using slop).  If you really want slop in your
matching, you should just use slop.

Query "foo the bar" will match document "foo and bar" in either case,
so it's non-differentiating here.

Also, it's bothersome that by default StopFilter throws away more
information than it needs to.  Ie, it's already discarding words
(that's its purpose) but the fact that it then also discards the holes
left behind, by default, is not good, I think.

I went and re-read http://issues.apache.org/jira/browse/LUCENE-1095.
Since both QueryParser and StopFilter can now preserve position
increments, I'd think we would want to change both to do so (in the
*Settings classes)?

(And, QueryParser is another great example where a *Settings class
would give us much more freedom to fix its quirks w/o breaking back
compat.)

Anyway, this is a great debate, in that any defaults set in Lucene
over time should be scrutinized, through discussions like this, rather
than simply always forcefully left on their back-compat defaults.  The
Settings class would give us this freedom.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Mon, May 18, 2009 at 5:06 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
>  * StopFilter should enable position increments by default

Is this one an actual improvement in the general case?
A query of "foo bar" then wouldn't match a document with "foo and
bar", but a query of "foo the bar" would.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Tue, May 19, 2009 at 9:34 AM, Yonik Seeley
<yo...@lucidimagination.com> wrote:

> Selecting backward compatibility vs latest and greatest could be done
> w/o Settings (a simple static int containing the version number to act
> like).  It seems like the Settings debate should be based on it's own
> merits.

But isn't a static int too restrictive?  That means all usage of
Lucene from within this JRE must match that version?

Or... we could add a "simple int" version identifier to certain
classes' ctors?  So when you create a Lucene class, you'd pass in
matchinVersion integer (eg VERSION_24)?  This way each class
internally switches its defaults, and we wouldn't have a central place
where all of 2.4's defaults are stored?

Or... something else?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by DM Smith <dm...@gmail.com>.

I like the idea of settings however it is implemented. With the  
blurring of core and contrib in the repackaging of Lucene, the issue  
of backward compatibility becomes more difficult. (maybe, I'm  
imagining problems where they don't exist.)

My concern with any of these mechanisms: codifying past behavior. What  
would be the expectation and policy regarding the keeping of such  
settings? Do these now become deprecated? Do we keep a 2.1 settings  
when we are releasing 2.4?

There is an even simpler solution, using existing policy: Frequent  
releases.
Would this be a big issue if we had frequent releases?

To go from 2.0 to 3.0 there is a 2.9 release, where the difference  
between the 2.9 release and the 3.0 release is the removal of  
deprecations. (Though with this release, it will be a bit bigger as it  
will also require Java 5.)

Every time we approach a release, there is a flurry of activity and  
the release gets pushed, for all practical purposes, indefinitely.

Pushed to absurdity: Only have x.0 (perhaps x.0.1) and x.9 releases.  
That is don't have a x.1 minor release. And have releases once a week,  
so that 2 times a month we have a major release. So twice a month we  
can break API compatibility and once a month we can break index  
compatibility.

The stability of the API over time is important to users. Having  
infrequent releases with a great product is a plus. (I'm really glad  
as I'm still stuck using Java 1.4!) Having the bridge via deprecation  
to newness is a great transitional help.

IMHO, the real challenge it to manage the release process. Managing  
that will help manage backward compatibility.

If you were to look at the schedule for Fedora, Eclipse,  
OpenOffice, ..., you'd find that each has a release plan with distinct  
stages. At each stage there is a release (testing/alpha/beta/RC1/ 
RC2/...) As the release process is being entered, generally a release  
branch is created. New development continues on trunk and something of  
perceived value may be ported to the branch. At some point there is a  
feature freeze and only bug fixes are accepted on the release branch.  
Having a branch with parallel development is a very strong  
encouragement to have a quick release, as  it is a pain to have it.

-- DM

On May 20, 2009, at 7:22 AM, Michael McCandless wrote:

> On Tue, May 19, 2009 at 4:50 PM, Yonik Seeley
> <yo...@lucidimagination.com> wrote:
>
>>> Right, that's exactly why I want to fix it (only one behavior  
>>> allowed
>>> and so for all of 2.* we must match the 2.0 behavior).
>>
>> I meant one jar per per-jvm gives you one behavior (as is the case  
>> now).
>> But by setting a static actsAs version number, you could get a 2.*  
>> jar
>> to behave as if it were 2.0, even as behaviors evolve.
>
> So I think you're suggesting something like this: when you use Lucene,
> if you want "latest and greatest" defaults, do nothing.
>
> If instead you want defaults to match a particular past minor release,
> you must call (say) LuceneVersions.setVersion(VERSION_21).
>
> Any place inside Lucene that has defaults that need to vary by version
> would then check this, and act accordingly.
>
> I absolutely love the simplicity of this solution (far simpler than
> *Settings classes).  It would achieve what I'm aiming for, which is to
> always be free on every minor release to set the defaults for new
> users to the latest & greatest.
>
> But:
>
>  1) It means any usage of Lucene inside the JRE must share that same
>     version default
>
>  2) It's a change to our back-compat policy, in that it requires the
>     app to declare what version compatibility it requires.
>
> On #1, maybe this is in fact just fine, since as you pointed out
> that's de-facto what we have today; it's just that the "actsAs" is
> hardwired to 2.0 for all 2.x releases.
>
> On #2, I think shifting the burden onto those apps that do in fact
> need strict back-compat on upgrading, to have to set the actsAs is a
> good change to our policy.  After all, we think such users are the
> minority and putting the burden on new users of Lucene seems
> unreasonable.
>
> So net/net I'm +1!
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Wed, May 20, 2009 at 8:28 AM, Yonik Seeley
<yo...@lucidimagination.com> wrote:

>> So I think you're suggesting something like this: when you use Lucene,
>> if you want "latest and greatest" defaults, do nothing.
>>
>> If instead you want defaults to match a particular past minor release,
>> you must call (say) LuceneVersions.setVersion(VERSION_21).
>
> Either way would work - we could reverse it for stronger back compat if desired.
> For 3.0, and all 3.x releases, set actsAsVersion=30000 by default in Lucene.

Right, we could go either way, so it's a policy decision.  (But I
favor changing our policy so users that require back-compat must call
setActsAs=XXX).

> A program could set actsAsVersion=LUCENE_VERSION_ANY (999999) and
> always get new behavior,
> or just  choose the specific version they are using to test/develop
> with; actsAsVersion=30201 to get the behavior changes of 3.2.1
>
> But since 3.0 is a major release anyway, we could change the default
> of actsAsVersion with each 3.x release (or just set it to 39999) and
> require that a users set actsAsVersion=30000 (or whatever version they
> are on) in order to get maximum back compatibility.
>
> For 2.9, we could start changing behavior and default
> actsAsVersion=20401 (or 20499?) to act like the latest 2.4.x release.

+1

So, in 2.9 we introduce this new way of managing Lucene's back compat
defaults, but there's no policy change, because actsAsVersion defaults
to 20099 (2.0.x).  This simply provides the machinery but does not
change the back-compat policy.  Users can upgrade to 2.9 like normal
and Lucene will act as 2.0.

Then, in 3.0 we default actsAsVersion to 399999, which is a policy
change, meaning if back-compat is important to you, your app should
set actsAsVersion accordingly.

> And we could still leisurely proceed with Settings classes where they
> made sense.

Right.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Mark Miller <ma...@gmail.com>.

Earwin Burrfoot wrote:
>> That said, I see the points and value of relaxing the back compat policy as well. Its been discussed a lot in the past, and it has been eased in the past.
>>     
> Afraid to ask which additional shackles Lucene bore in the past. 
:) The easing wasn't that match. Offhand, I remember Grant asking that 
we be allowed to change Field classes. There is an exception or two like 
that.
Not much for sure. There is generally much more discussion than action 
in Lucene. A whirlwind like this will kick up, and as often as things 
change (probably more often), nothing happens.

Lucene dev has seen activity recently like its never seen before though. 
So change is bound to come one way or another as more and more chefs 
start dipping their ideas and hands in.

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Earwin Burrfoot <ea...@gmail.com>.

> That said, I see the points and value of relaxing the back compat policy as well. Its been discussed a lot in the past, and it has been eased in the past.
Afraid to ask which additional shackles Lucene bore in the past. I
mean, 'what' has to be eased to produce policies we have right now?
Joking, just really happy that something is seemingly going to change.

> When the flood gates open, and code is rolling all over the place, upgrading Lucene becomes less of a buffet and more of a pain in the a**
Really, I've got a major pain in the *s* upgrading from 2.4 to trunk
(2.9). I upgraded to get per-segment collection and had to rewrite my
nontrivial collectors - no back compat effort could save me from it.

So, where to cast my weightless vote? :)

> We still should balance the "cost" of non back-compatible changes with
> the benefits.  As Doug has said: "Lucene has a large install base.  A
> little effort towards back-compatibility on our part saves folks a lot
> of effort."
That's a good approach.
Renaming a method, changing/adding some constructor parameters is
really easy, you don't need to keep old things around.
Doing deletes/norm updates through IndexWriter instead of IndexReader
- that's more work, but it's not complex.
Going from old Analyzer API to new one, or HitCollector -> Collector,
that's where real pain starts, because API changed dramatically not
only in its form, but in its meaning too. So a back-compat layer there
is reasonable.

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Mark Miller <ma...@gmail.com>.

Shai Erera wrote:
>
>     Thats not really the style Lucene has taken in the past :)
>
>
> Is it a back-compat policy? Maybe it's time to change that too ;) 
> (kidding)
>
> Shai
>
> On Wed, May 20, 2009 at 11:39 PM, Mark Miller <markrmiller@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Thats not really the style Lucene has taken in the past :)
>
>
Hey, it might be. Anything can change. Just have to be careful about 
changing a winning formula. Dont want it getting stale either.

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Shai Erera <se...@gmail.com>.

>
> Thats not really the style Lucene has taken in the past :)


Is it a back-compat policy? Maybe it's time to change that too ;) (kidding)

Shai

On Wed, May 20, 2009 at 11:39 PM, Mark Miller <ma...@gmail.com> wrote:

> Thats not really the style Lucene has taken in the past :)
>

Re: Lucene's default settings & back compatibility

Posted by Mark Miller <ma...@gmail.com>.

Shai Erera wrote:
>
>     When the flood gates open, and code is rolling all over the place,
>     upgrading Lucene becomes less of a buffet and more of a pain in
>     the a**
>
>
> I slightly disagree with that. If I'm a 2.2 user and I silently 
> upgraded to 2.4, 2.4.1 and 2.9 I will have loads of work to do when I 
> come to upgrade to 3.0 (because for me, 3.0 is exactly as if the gates 
> just opened).
Thats a good point - eventually you have to face the piper and move or 
not. But its kind of nicely isolated to a point every couple years (we 
have talked about releasing more often in the past, and I hope that 
comes up again). You can ride bug fixes for a whole version release (a 
couple years). With the new way, you can get the first bug fix release, 
but then you will quickly be left out of new bug fixes until you update 
your code.
>
> The way I see it, I *should* fully upgrade to 2.4, in order to spare 
> me the work when I upgrade to 3.0. By the time 3.0 is out I may have 
> so many changes to handle that I might re-consider upgrading at all.
You have both options today though right?
> Today, I believe users are not so silently upgrading, but prepare 
> themselves for the future. Even if they don't take advantage of new 
> defaults, they at least get rid of deprecated code because that's for 
> sure will change in the next major release, so why wait?
It would be nice to know these types of things more conclusively.
>
> A personal example - I wrote an Analyzer which includes lots of code 
> (lots of TokenFilters, Tokenizers etc.). Then I see that the whole 
> TokenStream API is deprecated and will be replaced. Do I have to 
> change the code right-away - NO. But I will do it because why wait for 
> 3.0? When 3.0 is out I will have much more things to do. I prefer 
> incremental changes to my code, then complete overhaul (better 
> testing-wise also) (this is a true example, not something I'm making up).
We are not always so in control of our time though :) The new 
TokenStream API is a bit confusing - I'd have held off upgrading. I 
think there is even Lucene code (tests?) that uses the old API. There is 
tons of test code using deprecated API's. I don't think most people move 
immediately, because most people are lazy ;) Or time constrained. I'm 
not fully in either camp though. Frankly, I'd argue either side and hope 
smarter people figure out the right choice ;)
>
> It is true that currently you can decide when you want to make 
> revisions to your code, but in reality I wonder how common it is.
>
> One way to check it (other than doing a survey) is to change the 
> policy and see how many scream at us :)
Thats not really the style Lucene has taken in the past :)
>
> Shai


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Wed, May 20, 2009 at 4:31 PM, Shai Erera <se...@gmail.com> wrote:
> A personal example - I wrote an Analyzer which includes lots of code (lots
> of TokenFilters, Tokenizers etc.). Then I see that the whole TokenStream API
> is deprecated and will be replaced.

Yeah, that one is going to be causing some headaches in Solr-dev-land
(but at least it should be at the -dev level and not the -user level).

We still should balance the "cost" of non back-compatible changes with
the benefits.  As Doug has said: "Lucene has a large install base.  A
little effort towards back-compatibility on our part saves folks a lot
of effort."

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Shai Erera <se...@gmail.com>.

>
> When the flood gates open, and code is rolling all over the place,
> upgrading Lucene becomes less of a buffet and more of a pain in the a**
>

I slightly disagree with that. If I'm a 2.2 user and I silently upgraded to
2.4, 2.4.1 and 2.9 I will have loads of work to do when I come to upgrade to
3.0 (because for me, 3.0 is exactly as if the gates just opened).

The way I see it, I *should* fully upgrade to 2.4, in order to spare me the
work when I upgrade to 3.0. By the time 3.0 is out I may have so many
changes to handle that I might re-consider upgrading at all.
Today, I believe users are not so silently upgrading, but prepare themselves
for the future. Even if they don't take advantage of new defaults, they at
least get rid of deprecated code because that's for sure will change in the
next major release, so why wait?

A personal example - I wrote an Analyzer which includes lots of code (lots
of TokenFilters, Tokenizers etc.). Then I see that the whole TokenStream API
is deprecated and will be replaced. Do I have to change the code right-away
- NO. But I will do it because why wait for 3.0? When 3.0 is out I will have
much more things to do. I prefer incremental changes to my code, then
complete overhaul (better testing-wise also) (this is a true example, not
something I'm making up).

It is true that currently you can decide when you want to make revisions to
your code, but in reality I wonder how common it is.

One way to check it (other than doing a survey) is to change the policy and
see how many scream at us :)

Shai

On Wed, May 20, 2009 at 11:17 PM, Mark Miller <ma...@gmail.com> wrote:

>
>
>  Earwin Burrfoot wrote:
>> See, you upgrade either for new features, or for performance
>>
>> improvements. You have to write code for former, and you have to write
>>
>> code for the latter (because by default most of them are off).
>> Mark Miller:
>>
>>
>>> If you have upgraded Lucene over the years and you never touched code to
>>> tweak performance, you still got fantastic performance improvements. You
>>> just didn't get them all.
>>>
>>>
>> If you never touched the code over the years, your project is probably
>> already dead
>>
> Does't alter the point though. You claimed that you missed the performance
> benefits if you upgraded Lucene, but you did not; regardless of if your
> project is dead, Lucene, with defaults, has seen large performance
> improvements over the years.
>
> Many healthy projects have components of working code that work as needed
> and are rarely touched. Should we be bending over backwards so that those
> users can plug in a speed improvement a year or two down the line with no
> hassle? Thats a different argument - one thats happened many times over the
> years on this list. But users did see fantastic performance improvements
> without changing code regardless.
>
> To the point of having to change a lot of code - right now you can easily
> pick and choose new features, defaults, and usually, upgrading lucene is
> fairly leisurely. When the flood gates open, and code is rolling all over
> the place, upgrading Lucene becomes less of a buffet and more of a pain in
> the a**. That said, I see the points and value of relaxing the back compat
> policy as well. Its been discussed a lot in the past, and it has been eased
> in the past.
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: Lucene's default settings & back compatibility

Posted by Mark Miller <ma...@gmail.com>.

> Earwin Burrfoot wrote:
> See, you upgrade either for new features, or for performance
>
> improvements. You have to write code for former, and you have to write
>
> code for the latter (because by default most of them are off).
> Mark Miller:
>   
>> If you have upgraded Lucene over the years and you never touched code to tweak performance, you still got fantastic performance improvements. You just didn't get them all.
>>     
> If you never touched the code over the years, your project is probably
> already dead
Does't alter the point though. You claimed that you missed the 
performance benefits if you upgraded Lucene, but you did not; regardless 
of if your project is dead, Lucene, with defaults, has seen large 
performance improvements over the years.

Many healthy projects have components of working code that work as 
needed and are rarely touched. Should we be bending over backwards so 
that those users can plug in a speed improvement a year or two down the 
line with no hassle? Thats a different argument - one thats happened 
many times over the years on this list. But users did see fantastic 
performance improvements without changing code regardless.

To the point of having to change a lot of code - right now you can 
easily pick and choose new features, defaults, and usually, upgrading 
lucene is fairly leisurely. When the flood gates open, and code is 
rolling all over the place, upgrading Lucene becomes less of a buffet 
and more of a pain in the a**. That said, I see the points and value of 
relaxing the back compat policy as well. Its been discussed a lot in the 
past, and it has been eased in the past.

-- 
- Mark

http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Earwin Burrfoot <ea...@gmail.com>.

Mark Miller:
> If you have upgraded Lucene over the years and you never touched code to tweak performance, you still got fantastic performance improvements. You just didn't get them all.
If you never touched the code over the years, your project is probably
already dead.

Shai Erera:
> Exactly ! which is why I think we should relax the back-compat policy "a
> bit".
Index compatibility across versions is verily important, that I can't
argue with.
Drop-in compatibility between bugfix releases, absolutely.
Spare me throngs of deprecated stuff and should-be-dead code on major
releases. And major release is any release that takes more than half a
year, matter not which part of version number you increment.

> And ... (I realize it's going to complicate things a bit) we could also
> decide to have dot release for bug fixes, like we had 2.4.1. So let's say
> when 3.4 comes (3-4 years from now :) ). In 3.6 we don't preserve any
> back-compat. If there is a bug, we fix it on 3.6 and also on a 3.4.1 branch.
> Those that just want to take the bug fixes can upgrade to 3.4.1. Those that
> upgrade to 3.6 get the bug fixes and all the rest of the changes done, so
> they should be ready to change their code.
+1. That's how the rest of the world does it.

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Mark Miller <ma...@gmail.com>.

Shai Erera wrote:
>
> +1 (am I allowed to cast +1s not being a committer?) :)
Absolutely :) When push comes to shove, you don't even have a valid vote 
as a Committer. Only members of the PMC have binding votes.

You have as much voting power as a committer as long as you have as much 
an ability to sway the PMC votes, if/when it comes down to it.

They are likely to be monitoring the community votes in making their 
decision


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Shai Erera <se...@gmail.com>.

Earwin - I wrote it before - index structure is the only back-compat policy
I propose to keep, and not for just one major release, but for 2 (which I
believe is the current behavior already). I also absolutely don't want to
give that up.

I think it's not unreasonable to say "if you want to take advantage of
> Lucene's perf improvements and new features, on upgrading you'll have
> to recompile, fix APIs, etc.".
>

+1 (am I allowed to cast +1s not being a committer?) :)

On Wed, May 20, 2009 at 11:06 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Wed, May 20, 2009 at 3:24 PM, Shai Erera <se...@gmail.com> wrote:
> > Then why go through all this trouble and not simply change the
> back-compat
> > policy?
>
> OK so let's talk policy now ;)
>
> We need some serious relaxing of the back-compat policy to make the
> actsAsVersion proposal pointless.
>
> Ie whenever we want to change a default, eg sorting by field should
> not compute scores, IndexWriter should suddenly default autoCommit to
> false, IndexReader.open gives you a readOnly reader, MultiTermQuery is
> constant score by default (once we fix BQ to do constant score), docs
> are scored out-of-order by BQ, stop filter preserves positions, etc.,
> we need to be "allowed" (by our policy) make such changes in the next
> dot release.
>
> I want new users on every dot-release to always get the
> latest&greatest defaults.  Every change we make needs to be free to
> adopt the best defaults.
>
> If we relax our policy enough so that we have full freedom to set
> defaults only according to new users, then I agree actsAsVersion is
> not needed.
>
> Back-compat is insanely costly, especially the longer it takes us to
> get to the next major release...  yet, the specific cost that bothers
> me the most is that we hurt our new users because of the back-compat
> users.  It hurts Lucene's adoption/growth.
>
> Another consideration on relaxing policy is that back-compat is well
> nigh impossible to actually achieve.  We spend an insane amount of our
> energy maintaining back-compat, but then one accidental breakage that
> slips through quickly causes many back-compat users to conclude we are
> not back-compat.  It's not much bang and alot of buck.
>
> It is tempting to change our policy to something like:
>
>  * Bug fixes only on each 2.4.X release
>
>  * Anything can change on each 2.X release, but any prior 2.Y index
>    format is readable
>
> I think it's not unreasonable to say "if you want to take advantage of
> Lucene's perf improvements and new features, on upgrading you'll have
> to recompile, fix APIs, etc.".
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: Lucene's default settings & back compatibility

Posted by Shai Erera <se...@gmail.com>.

Exactly ! which is why I think we should relax the back-compat policy "a
bit".

And ... (I realize it's going to complicate things a bit) we could also
decide to have dot release for bug fixes, like we had 2.4.1. So let's say
when 3.4 comes (3-4 years from now :) ). In 3.6 we don't preserve any
back-compat. If there is a bug, we fix it on 3.6 and also on a 3.4.1 branch.
Those that just want to take the bug fixes can upgrade to 3.4.1. Those that
upgrade to 3.6 get the bug fixes and all the rest of the changes done, so
they should be ready to change their code.

I don't think we should maintain 3.4.1-like releases for too many releases
though. I.e., as soon as 3.6 is out, no bugs are fixed on the 3.4 branch,
just 3.6.

On Wed, May 20, 2009 at 10:33 PM, Earwin Burrfoot <ea...@gmail.com> wrote:

> In fact, there's no reason to upgrade Lucene (save for bigfixes), if
> you absolutely require a drop-in jar, and don't want to touch any of
> your code.
> See, you upgrade either for new features, or for performance
> improvements. You have to write code for former, and you have to write
> code for the latter (because by default most of them are off). So, if
> you're not ready to patch your app, you don't get any of this, so why
> bother upgrading at all?
>
> On Wed, May 20, 2009 at 23:24, Shai Erera <se...@gmail.com> wrote:
> > Then why go through all this trouble and not simply change the
> back-compat
> > policy?
> >
> > Really, I read some of Grant's responses and I realize that I've upgraded
> to
> > 2.4 way too long ago. 2.9 is nowhere in sight. It takes a lot of time to
> > release and during that time there's lots of discussions on the mailing
> > list, lots of issues and so on. What I'm trying to say is that with the
> > amount of communication on this mailing list, people have a lot of
> > opportunities to pick up changes, in addition to the CHANGES file.
> >
> > In 2.9 we're breaking back-compat, with those "Changes in backward
> > compatibility" section in CHANGES. So that makes it 2.4 and 2.9 in a row
> > where back-compat was not delivered as promised.
> >
> > And how radical is it to ask people to update their code when they
> upgrade?
> > Yes, if we were releasing every month, like was suggested previously, I
> can
> > understand why it's important. But we're not. So changing my code every
> 6-9
> > months is not that bad. Most chances I'll change my code because of other
> > things, not just Lucene.
> >
> > To me, all this Settings class (or actsAsVersion) will only complicate
> > things. If I understand correctly, than in 2.9 we'll have the code
> > defaulting to "actAs29", with the ability to change it to "actAs24".
> Doesn't
> > that mean I need to update my code if I want to retain 2.4 behavior? If I
> > already touch my code, how complicated is it to really match my app to
> 2.9?
> > I mean, how many people write Collectors, and among those - how many
> > Collectors do they write? We've gone through a hell lot of discussions in
> > 1575 just to protect those who still use HitCollector, but I'm not sure
> how
> > many users we actually protected.
> >
> > First, I think we should seriously consider to drop the "jar drop-in
> > ability" requirement. I don't see any benefits from doing that, except
> for
> > bug fixes. Second, usually the changes in runtime behavior is for
> improving
> > things (such as performance) - so I don't see why we can't ask someone
> > upgrading to a newer version to take advantage of those improvements.
> >
> > Grant suggested we discuss the back-compat policy, since if we resolve
> that
> > we might not need Settings or actAs solution. I agree with that proposal.
> If
> > we can relax our back-compat policy to the point of just the index
> structure
> > (since between us, that's the most expensive thing you can hit when
> > upgrading a Lucene version) then I don't think we need these
> Settings/actAs
> > approaches.
> >
> > And BTW, the code today is already packed with deprecated methods, which
> > neither Settings nor actAs will solve. So even by adopting new defaults,
> > we'll still have troubles with back-compat, since we'll need to deprecate
> > methods/classes and worse - find alternative names !
> >
> > We could also decide to have X.0, X.5 and X+1.0 as point releases where
> > back-compat changes (removing deprecated methods and changing defaults).
> > That way we'll keep everybody happy, w/o needing to add Settings/actAs or
> > wait 1-2 years before a major release is out.
> >
> > Shai
> >
> > On Wed, May 20, 2009 at 10:10 PM, Michael McCandless
> > <lu...@mikemccandless.com> wrote:
> >>
> >> On Wed, May 20, 2009 at 12:55 PM, Andi Vajda <va...@osafoundation.org>
> >> wrote:
> >>
> >> > I've been watching this thread with interest with my opinion swaying
> >> > back
> >> > and forth.
> >>
> >> So have I!
> >>
> >> > This last comment, though, pushes me to favor the settings class idea
> >> > because that idea came with the promise of eliminating the
> combinatorial
> >> > explosion of contructor and method overloads.
> >> >
> >> > In addition, I very much like the idea of having one place list all
> the
> >> > coherent configuration choices one can make. No, CHANGES.txt is not
> it.
> >> > While it's interesting reading, it reads like a blog. It doesn't tie
> >> > sensible settings together. It only gives a differential and
> >> > chronological
> >> > view of changes.
> >> >
> >> > Having version-specific settings classes is a really neat place to
> list
> >> > all
> >> > possible settings in one place with sensible and coherent values for a
> >> > version.
> >>
> >> The thing is... the number of settings will be large over time, and so
> >> we'll need a hierarchy of classes, or we fallback to Properties w/ the
> >> hierarchy encoded in the string, but then you have a weakly typed API,
> >> and you lose the self-documenting (like Grant observed).
> >>
> >> Ie, in theory I love the idea of Settings, but in practice, as I start
> >> to think about the realities of implementing it, I realize it's gonna
> >> be a big challenge to solve it well.  This goes waaay beyond resolving
> >> the back-compat vs new users conflict we have today.
> >>
> >> Pushing to the way future, I'm also not convinced it's great that I
> >> have to go to two places (IndexWriter and its *Settings counterpart)
> >> to manage my "IndexWriter".
> >>
> >> I think the idea can work, but I'm realizing it's a huuuge project (vs
> >> actsAsVersion which is quite simple).
> >>
> >> > The same idea could be used for other things than version by the
> >> > way. It could help in picking one side of a configuration trade off
> over
> >> > another.
> >> >
> >> > For example:
> >> >   - a settings for favoring speed of updates over speed of queries if
> >> > that
> >> >     makes sense
> >> >   - a settings for favoring index size over indexing speed
> >> >   ... and so on.
> >>
> >> Right -- Solr is discussing this now, too.  I think this would be
> >> useful.
> >>
> >> > I don't see why this has to be limited just to Lucene version
> backwards
> >> > compatibility.
> >>
> >> I think we should do "actsAsVersion" today, solely to resolve the
> >> back-compat vs new users conflict, and continue to explore/discuss
> >> Settings for these other reasons.
> >>
> >> > Oh, and about that: I think we've reached the breaking point
> >> > about backwards compatibility support a while ago. I recently hit a
> bug
> >> > in
> >> > my code where a commit() call was missing. Before 2.4, flushing the
> >> > index
> >> > committed it. Starting with 2.4, this is no longer the case. Yes, this
> >> > is
> >> > documented and that helped me fix the bug really quickly but backwards
> >> > compatible it is not.
> >>
> >> Hmm -- I think we should have had flush() just call commit().
> >>
> >> > My point here is that we've promised too much
> >> > backwards compatibility for too long and it's been getting too hard to
> >> > deliver that promise now.
> >>
> >> I think it's high time we release 3.0 then!
> >>
> >> Mike
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>
> >
> >
>
>
>
> --
> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
> ICQ: 104465785
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Thu, May 21, 2009 at 7:21 AM, Shai Erera <se...@gmail.com> wrote:
> I thought that the index file format is supposed to be supported until the
> 2nd major release. I.e. 3.0 will still read 2.0 indexes, but 4.0 won't. Is
> that what you meant, or am I wrong?

Woops, you're correct:

   http://wiki.apache.org/jakarta-lucene/BackwardsCompatibility

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Mark Miller <ma...@gmail.com>.

Earwin Burrfoot wrote:
> See, you upgrade either for new features, or for performance
> improvements. You have to write code for former, and you have to write
> code for the latter (because by default most of them are off). 
Thats not completely true. If you have upgraded Lucene over the years 
and you never touched code to tweak performance, you still got fantastic 
performance improvements. You just didn't get them all.

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Earwin Burrfoot <ea...@gmail.com>.

In fact, there's no reason to upgrade Lucene (save for bigfixes), if
you absolutely require a drop-in jar, and don't want to touch any of
your code.
See, you upgrade either for new features, or for performance
improvements. You have to write code for former, and you have to write
code for the latter (because by default most of them are off). So, if
you're not ready to patch your app, you don't get any of this, so why
bother upgrading at all?

On Wed, May 20, 2009 at 23:24, Shai Erera <se...@gmail.com> wrote:
> Then why go through all this trouble and not simply change the back-compat
> policy?
>
> Really, I read some of Grant's responses and I realize that I've upgraded to
> 2.4 way too long ago. 2.9 is nowhere in sight. It takes a lot of time to
> release and during that time there's lots of discussions on the mailing
> list, lots of issues and so on. What I'm trying to say is that with the
> amount of communication on this mailing list, people have a lot of
> opportunities to pick up changes, in addition to the CHANGES file.
>
> In 2.9 we're breaking back-compat, with those "Changes in backward
> compatibility" section in CHANGES. So that makes it 2.4 and 2.9 in a row
> where back-compat was not delivered as promised.
>
> And how radical is it to ask people to update their code when they upgrade?
> Yes, if we were releasing every month, like was suggested previously, I can
> understand why it's important. But we're not. So changing my code every 6-9
> months is not that bad. Most chances I'll change my code because of other
> things, not just Lucene.
>
> To me, all this Settings class (or actsAsVersion) will only complicate
> things. If I understand correctly, than in 2.9 we'll have the code
> defaulting to "actAs29", with the ability to change it to "actAs24". Doesn't
> that mean I need to update my code if I want to retain 2.4 behavior? If I
> already touch my code, how complicated is it to really match my app to 2.9?
> I mean, how many people write Collectors, and among those - how many
> Collectors do they write? We've gone through a hell lot of discussions in
> 1575 just to protect those who still use HitCollector, but I'm not sure how
> many users we actually protected.
>
> First, I think we should seriously consider to drop the "jar drop-in
> ability" requirement. I don't see any benefits from doing that, except for
> bug fixes. Second, usually the changes in runtime behavior is for improving
> things (such as performance) - so I don't see why we can't ask someone
> upgrading to a newer version to take advantage of those improvements.
>
> Grant suggested we discuss the back-compat policy, since if we resolve that
> we might not need Settings or actAs solution. I agree with that proposal. If
> we can relax our back-compat policy to the point of just the index structure
> (since between us, that's the most expensive thing you can hit when
> upgrading a Lucene version) then I don't think we need these Settings/actAs
> approaches.
>
> And BTW, the code today is already packed with deprecated methods, which
> neither Settings nor actAs will solve. So even by adopting new defaults,
> we'll still have troubles with back-compat, since we'll need to deprecate
> methods/classes and worse - find alternative names !
>
> We could also decide to have X.0, X.5 and X+1.0 as point releases where
> back-compat changes (removing deprecated methods and changing defaults).
> That way we'll keep everybody happy, w/o needing to add Settings/actAs or
> wait 1-2 years before a major release is out.
>
> Shai
>
> On Wed, May 20, 2009 at 10:10 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>>
>> On Wed, May 20, 2009 at 12:55 PM, Andi Vajda <va...@osafoundation.org>
>> wrote:
>>
>> > I've been watching this thread with interest with my opinion swaying
>> > back
>> > and forth.
>>
>> So have I!
>>
>> > This last comment, though, pushes me to favor the settings class idea
>> > because that idea came with the promise of eliminating the combinatorial
>> > explosion of contructor and method overloads.
>> >
>> > In addition, I very much like the idea of having one place list all the
>> > coherent configuration choices one can make. No, CHANGES.txt is not it.
>> > While it's interesting reading, it reads like a blog. It doesn't tie
>> > sensible settings together. It only gives a differential and
>> > chronological
>> > view of changes.
>> >
>> > Having version-specific settings classes is a really neat place to list
>> > all
>> > possible settings in one place with sensible and coherent values for a
>> > version.
>>
>> The thing is... the number of settings will be large over time, and so
>> we'll need a hierarchy of classes, or we fallback to Properties w/ the
>> hierarchy encoded in the string, but then you have a weakly typed API,
>> and you lose the self-documenting (like Grant observed).
>>
>> Ie, in theory I love the idea of Settings, but in practice, as I start
>> to think about the realities of implementing it, I realize it's gonna
>> be a big challenge to solve it well.  This goes waaay beyond resolving
>> the back-compat vs new users conflict we have today.
>>
>> Pushing to the way future, I'm also not convinced it's great that I
>> have to go to two places (IndexWriter and its *Settings counterpart)
>> to manage my "IndexWriter".
>>
>> I think the idea can work, but I'm realizing it's a huuuge project (vs
>> actsAsVersion which is quite simple).
>>
>> > The same idea could be used for other things than version by the
>> > way. It could help in picking one side of a configuration trade off over
>> > another.
>> >
>> > For example:
>> >   - a settings for favoring speed of updates over speed of queries if
>> > that
>> >     makes sense
>> >   - a settings for favoring index size over indexing speed
>> >   ... and so on.
>>
>> Right -- Solr is discussing this now, too.  I think this would be
>> useful.
>>
>> > I don't see why this has to be limited just to Lucene version backwards
>> > compatibility.
>>
>> I think we should do "actsAsVersion" today, solely to resolve the
>> back-compat vs new users conflict, and continue to explore/discuss
>> Settings for these other reasons.
>>
>> > Oh, and about that: I think we've reached the breaking point
>> > about backwards compatibility support a while ago. I recently hit a bug
>> > in
>> > my code where a commit() call was missing. Before 2.4, flushing the
>> > index
>> > committed it. Starting with 2.4, this is no longer the case. Yes, this
>> > is
>> > documented and that helped me fix the bug really quickly but backwards
>> > compatible it is not.
>>
>> Hmm -- I think we should have had flush() just call commit().
>>
>> > My point here is that we've promised too much
>> > backwards compatibility for too long and it's been getting too hard to
>> > deliver that promise now.
>>
>> I think it's high time we release 3.0 then!
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Shai Erera <se...@gmail.com>.

I thought that the index file format is supposed to be supported until the
2nd major release. I.e. 3.0 will still read 2.0 indexes, but 4.0 won't. Is
that what you meant, or am I wrong?

Shai

On Thu, May 21, 2009 at 2:17 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> OK so it sounds like we've boiled the proposal down to two concrete
> changes to the back-compat policy:
>
>  1) Default settings can change; we will always choose defaults based
>     on "latest & greatest for new users".  This only affects "runtime
>     behavior".  EG in 2.9, when sorting by field you won't get scores
>     by default.  When we do this we should clearly document the
>     change, and what settings one could use to get back to the old
>     behavior, in CHANGES.txt.
>
>  2) An API, once released as deprecated, is fair game to be removed
>     in the next minor release.
>
> We still only make bug fixes on point releases, support the index file
> format until the next major release -- those don't change.
>
> Mike
>
> On Wed, May 20, 2009 at 11:34 PM, Shai Erera <se...@gmail.com> wrote:
> >> With the new way, you can get the first bug fix release, but then you
> will
> >> quickly be left out of new bug fixes until you update your code.
> >
> > Mark, apologies for the late reference, but it struck me only after I
> left
> > the computer yesterday. Again, I'm not sure how bit of a problem is it.
> > Suppose that I upgrade to 2.4 and the next version (6 months?) is 2.9.
> Then
> > a bug is discovered and is fixed in 2.4.1 and 2.9. In addition, 2.9
> contains
> > all these changes in Collectors. When 2.9 is out I decide not to upgrade
> to
> > 2.9 because I don't have time. When 3.0 comes out (3-4 months later?) I
> will
> > be forced to upgrade. That means ~1 year since I last upgraded my Lucene
> > code sections.
> > (True, if there will be any bug fixes in 2.9, I will miss them).
> >
> > How unreasonable is to ask this? Seriously, how many apps are not touched
> > *at all* during one year? And even if these code segments are stable and
> no
> > one touches them anymore, how problematic is it to request users to once
> a
> > year do a sort of cleanup and update to their code?
> >
> >> In other words, we keep deprecated around for only one or two versions.
> >
> > That is a reasonable approach. Meaning, defaults may change between
> releases
> > because we'd like Lucene users to get the latest & greatest stuff, (and
> also
> > count on the fact not so many out there strongly rely on the defaults?)
> but
> > methods removal/rename should cause a little more trouble, so we can give
> > the users one release to arrange their time before they have to do
> anything.
> >
> > Maybe the TokenStream API needs to stay deprecated for longer, until
> we're
> > sure everybody is happy with the new API.
> >
> > Shai
> >
> > On Thu, May 21, 2009 at 4:23 AM, Grant Ingersoll <gs...@apache.org>
> > wrote:
> >>
> >> On May 20, 2009, at 4:06 PM, Michael McCandless wrote:
> >>
> >>> On Wed, May 20, 2009 at 3:24 PM, Shai Erera <se...@gmail.com> wrote:
> >>>>
> >>>> Then why go through all this trouble and not simply change the
> >>>> back-compat
> >>>> policy?
> >>>
> >>> Back-compat is insanely costly, especially the longer it takes us to
> >>> get to the next major release...  yet, the specific cost that bothers
> >>> me the most is that we hurt our new users because of the back-compat
> >>> users.  It hurts Lucene's adoption/growth.
> >>>
> >>> Another consideration on relaxing policy is that back-compat is well
> >>> nigh impossible to actually achieve.  We spend an insane amount of our
> >>> energy maintaining back-compat, but then one accidental breakage that
> >>> slips through quickly causes many back-compat users to conclude we are
> >>> not back-compat.  It's not much bang and alot of buck.
> >>>
> >>> It is tempting to change our policy to something like:
> >>>
> >>>  * Bug fixes only on each 2.4.X release
> >>>
> >>>  * Anything can change on each 2.X release, but any prior 2.Y index
> >>>   format is readable
> >>>
> >>> I think it's not unreasonable to say "if you want to take advantage of
> >>> Lucene's perf improvements and new features, on upgrading you'll have
> >>> to recompile, fix APIs, etc.".
> >>
> >>
> >> All reasonable, Mike.  My take is that Lucene has always been pragmatic
> >> about darn near everything, except back compat, where we are pretty
> >> dogmatic.
> >>
> >> In general, I think it is reasonable to say that even from 2.x to 2.y we
> >> will try to be back compatible, but when we deem it necessary, we
> reserve
> >> the right to change things.  I don't think anyone here is suggesting we
> >> would ever do something drastic like a complete overhaul of all the APIs
> in
> >> a version change.  I also think it is reasonable to deprecate things by
> >> saying @deprecated Will be removed in 2.Y.  Use coolNewMethod instead.
> In
> >> other words, we keep deprecated around for only one or two versions.  Of
> >> course, the timing can vary.  Things like changing the Document stuff
> like
> >> we've talked about might last longer (or shorter, actually) while minor
> >> deprecations may only be kept for one.  The index compatibility stuff is
> a
> >> must.
> >>
> >> It is probably worthwhile to ask on java-user@ how many people rely on
> our
> >> back compat policies.
> >>
> >> <tongue in cheek> Of course, we do already support back compat for all
> >> versions:  svn checkout
> >> http://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_3_1/ </tongue in
> >> cheek>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

RE: Lucene's default settings & back compatibility

Posted by Steven A Rowe <sa...@syr.edu>.

On 5/21/2009 at 7:17 AM, Michael McCandless wrote:
> OK so it sounds like we've boiled the proposal down to two concrete
> changes to the back-compat policy:
> 
>   1) Default settings can change; we will always choose defaults
>      based on "latest & greatest for new users".  This only
>      affects "runtime behavior".  EG in 2.9, when sorting by
>      field you won't get scores by default.  When we do this we
>      should clearly document the change, and what settings one
>      could use to get back to the old behavior, in CHANGES.txt.
> 
>   2) An API, once released as deprecated, is fair game to be
>      removed in the next minor release.
> 
> We still only make bug fixes on point releases, support the index
> file format until the next major release -- those don't change.

Contrasting the upgrade experience of existing "maintenance" users (i.e., users not using new Lucene features) under current policy with their experience under the above proposals:

Currently there are two upgrade experiences for these users: a) upgrading within the same major release; and b) major release upgrades.  

For a), the user reads CHANGES for back-compat exceptions, but otherwise has drop-in compatibility.  For b), the user performs two upgrades: first, just like in a), to the last minor release in the same major release, addressing all deprecation warnings; and second, to the major release, with drop-in compatibility, modulo CHANGES.

Here's the upgrade procedure under the above proposals, from version X.Y to X.Z:

1. Address all deprecation warnings against the currently used Lucene version (call it version X.Y[0]).

2. Upgrade to X.(++Y), addressing all deprecation warnings and checking CHANGES for exceptions to the back-compat policy, including mechanisms to maintain X.Y[0] defaults. 

3. Iterate #2 until Y==Z.

One consequence of these changes is that major version upgrades the same as minor version upgrades, with the exception that index format support (and default settings support?) will potentially require attention.

Another consequence is that upgrade effort will no longer be amortizable.  Currently, maintenance users can skip minor version upgrades with almost no penalty, and defer the upgrade pain to major release upgrades, since deprecation warnings can be safely ignored.  (Not advocating this practice, just noting that it's possible.)

Steve

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Earwin Burrfoot <ea...@gmail.com>.

Sounds like a good proposition.

There's one problem I'd like to address. Good names for
classes/members matter, and matter much. They directly affect how fast
a newcomer is able to understand that particular API, it also affects
how comfortable you work with it once you did understand. When we're
deprecating existing methods and adding new, 'better' ones, bad or
mediocre names replace good names in parts of the code that are most
often used. And there's no way around it.

It's somewhat crazy, but what if we deprecate stuff and rename it? New
stuff gets best names, old stuff is still accessible and with a
"Migration Guide" it's easy to patch client code.

On Fri, May 22, 2009 at 00:34, Shai Erera <se...@gmail.com> wrote:
> I thought we were actually on the track towards not introducing any Settings
> and/or actAs, but instead just change the policy?
>
> Can we agree on the following:
>
> * Changes to the index file formats need to be supported for 2 major
> releases. I.e. 2.X indexes need to be read by 3.Y code, but not by 4.0.
>
> * Method deprecations last for one full minor release. That is a deprecation
> in 2.X lasts through 2.X.1, 2.X+1 but is removed in 2.X+2. If all those X's
> are confusing --> a deprecation in 2.4 is kept in 2.4.X and 2.5, but we're
> free to remove it in 2.6.
>
> * Changes to default behaviors (whether they are bug fixes or improvements),
> where it only affects runtime code, and not the index structure or indexed
> data (such as the InvalidAcronym bug fix) are ok to go into any minor
> release, w/o deprecation - so long we're documenting the change in CHANGES
> along with some sample code on how to migrate easily.
>
> * Changes to default behaviors, bug fixes or improvements, that may
> compromise the index structure or indexed data (such as InvalidAcronym) will
> last for at least one major release, if not 2 (just like supporting file
> formats). The reason is - rebuilding indexes, besides that it might be a
> heavy process, is not often acceptable by the customers of those who develop
> search solutions. Therefore it may be out of our hands. Personally, I don't
> think those will happen a lot, but when they will we can choose between:
> (1) Deprecating a class entirely in favor of a new one, such that anyone who
> upgrades can still use that class
> (2) Introduce a static setter for that behavior, like for InvalidAcronym
> (3) Add a actsAs to that class only.
>
> Am I missing a back-compat issue?
>
> What I don't like about actsAs, and perhaps I just don't understand the
> proposal well, is that I'm not sure where it's added. Will it be added to
> IndexWriter, which will pass it on to all the classes it will meet/use?
>
> If I covered all the back-compat issues above, and we agree on them, then
> for the first 3 we just need to document them on the back-compat page, no
> code to develop.
>
> For the last one, if we choose to adopt (1) or (2), then we don't need to
> develop any mehcanism up-front, but decide on a per-case basis what's the
> best alternative. For example, for the InvalidAcronym we could have
> deprecated that particular TokenFilter in favor of a new one and give a code
> example on how to create a TokenStream with that deprecated TokenFilter.
>
> Shai
>
> On Thu, May 21, 2009 at 10:55 PM, Jason Rutherglen
> <ja...@gmail.com> wrote:
>>
>> I'm having trouble visualizing the various methods people are talking
>> about.  It seems like we could open an issue and post patches with code
>> illustrating what each person is talking about?
>>
>> On Thu, May 21, 2009 at 10:02 AM, Michael McCandless
>> <lu...@mikemccandless.com> wrote:
>>>
>>> Actually, we started with the *Settings classes (to hold defaults),
>>> but then realized a simple actsAsVersion (single static method) would
>>> suffice for just the back-compat settings and then pushed further and
>>> thought perhaps we should relax our back-compat policy entirely so
>>> emulating older versions is not needed.
>>>
>>> So we no longer have the "defaults" class (*Settings).  We may still
>>> do it for the future (for its own benefits), but for just back-compat
>>> of default settings, it seems like overkill.
>>>
>>> But I agree, the index altering cases are spooky.  I think this'd make
>>> me favor going back to the actsAsVersion option instead of the hard
>>> flip on our back compat policy (at least for default settings; for API
>>> changes I think 1 whole minor release may be reasonable).
>>>
>>> Mike
>>>
>>> On Thu, May 21, 2009 at 12:54 PM, Matthew Hall
>>> <mh...@informatics.jax.org> wrote:
>>> > Sorry, I wasn't quite sure what to call this new class you guys have
>>> > been
>>> > talking about.
>>> >
>>> > I was referring to the class that's being discussed to encapsulate all
>>> > of
>>> > the defaults for a given lucene release.  (Its caching strategies etc
>>> > etc)
>>> >
>>> > I'm just not certain that something like a static list of words belongs
>>> > in a
>>> > higher level defaults class like you guys are talking about, especially
>>> > considering that anyone using a stop enabled analyzer really should be
>>> > familiar with this list, and oftentimes needs to override it.
>>> >
>>> > Meh, now that I'm actually typing it out though, perhaps I'm incorrect
>>> > here,
>>> > assuming this class you guys are describing will be well
>>> > advertised/documented maybe it will actually make it easier for end
>>> > developers to twiddle around with this list, or at least certainly make
>>> > them
>>> > more aware that its even something that they have the ability to
>>> > actually
>>> > change.
>>> >
>>> > Matt
>>> >
>>> > Michael McCandless wrote:
>>> >>
>>> >> What is the "lucene defaults class"?
>>> >>
>>> >> Mike
>>> >>
>>> >> On Thu, May 21, 2009 at 12:37 PM, Matthew Hall
>>> >> <mh...@informatics.jax.org> wrote:
>>> >>
>>> >>>
>>> >>> For extreme examples like this, couldn't the stopword list be
>>> >>> encapsulated
>>> >>> into a single class that's used by the lucene defaults class.
>>> >>>
>>> >>> That way if you folks released updates to mostly static content like
>>> >>> a
>>> >>> stopword list, new or old users could get it easily with a simple
>>> >>> drop in
>>> >>> fix?
>>> >>>
>>> >>> Just my two cents.
>>> >>>
>>> >>> Matt
>>> >>>
>>> >>> Michael McCandless wrote:
>>> >>>
>>> >>>>
>>> >>>> On Thu, May 21, 2009 at 12:19 PM, Robert Muir <rc...@gmail.com>
>>> >>>> wrote:
>>> >>>>
>>> >>>>
>>> >>>>>
>>> >>>>> even as simple as changing default stopword list for some analyzer
>>> >>>>> could
>>> >>>>> be
>>> >>>>> an issue, if the user doesn't re-index in response to that change.
>>> >>>>>
>>> >>>>>
>>> >>>>
>>> >>>> OK, right.
>>> >>>>
>>> >>>> So say we forgot to include "the" in the default English stopwords
>>> >>>> list (yes, an extreme example...).
>>> >>>>
>>> >>>> Under the proposed changes 1 & 2 to back-compat policy, we would add
>>> >>>> "the" to the default stopword list, so new users get the fix, but
>>> >>>> still keep the the-less list accessible (deprecated).  We'd add an
>>> >>>> entry in CHANGES.txt saying this happened, and then show code on how
>>> >>>> to get back to the the-less stopword list.
>>> >>>>
>>> >>>> New users using that StopFilter would properly see "the" filtered
>>> >>>> out.
>>> >>>>  Users who upgraded would need to fix their code to switch back to
>>> >>>> the
>>> >>>> deprecated the-less list.
>>> >>>>
>>> >>>> Mike
>>> >>>>
>>> >>>>
>>> >>>> ---------------------------------------------------------------------
>>> >>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> >>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>
>>> >>> --
>>> >>> Matthew Hall
>>> >>> Software Engineer
>>> >>> Mouse Genome Informatics
>>> >>> mhall@informatics.jax.org
>>> >>> (207) 288-6012
>>> >>>
>>> >>>
>>> >>>
>>> >>> ---------------------------------------------------------------------
>>> >>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> >>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>> >>>
>>> >>>
>>> >>>
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>> >>
>>> >>
>>> >
>>> >
>>> > --
>>> > Matthew Hall
>>> > Software Engineer
>>> > Mouse Genome Informatics
>>> > mhall@informatics.jax.org
>>> > (207) 288-6012
>>> >
>>> >
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> > For additional commands, e-mail: java-dev-help@lucene.apache.org
>>> >
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Robert Muir <rc...@gmail.com>.

On Thu, May 21, 2009 at 5:55 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Thu, May 21, 2009 at 5:44 PM, Robert Muir <rc...@gmail.com> wrote:
> > and what if your analyzer needs a third-party library (or two)?
>
> In such cases the back-compat of your analyzer is your responsibility,
> right?

ICUCollationFilter is a simple example. just saying, pretend there was a
back-compat issue with that, maybe even specific to a certain Locale.
that could get nasty quick....

-- 
Robert Muir
rcmuir@gmail.com

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Thu, May 21, 2009 at 5:44 PM, Robert Muir <rc...@gmail.com> wrote:
> and what if your analyzer needs a third-party library (or two)?

In such cases the back-compat of your analyzer is your responsibility,
right?

> i mean this isn't unique to analyzers, if something changes/bug is fixed in
> the guts of some query/scorer that affects scoring in the slightest then
> thats a potential issue too, right?
>
> for a big index burying a result deep is effectively the same as the
> stopword example...

If it's a bug fix, or a change in order-of-operations causing slightly
different floating point truncations, we are free to make those fixes
(even under the current back-compat policy)?  Ie, nothing is changing
for those cases.

But, say we found some improvement to how Lucene does scoring, and by
and large it improves relevance so we want to do it.  New users should
see this benefit.  Back-compat users, I think, should be able to set
actsAsVersion to get back to the old scoring model.

So yeah I think I agree it's not just changes that affect what gets
indexed, but also changes that affect how scores are computed, where
we need a way to specify a back-compat version on upgrading.

I think we can't get away with only policy changes here... I think we
need actsAsVersion to preserve back-compat.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Robert Muir <rc...@gmail.com>.

and what if your analyzer needs a third-party library (or two)?

i mean this isn't unique to analyzers, if something changes/bug is fixed in
the guts of some query/scorer that affects scoring in the slightest then
thats a potential issue too, right?

for a big index burying a result deep is effectively the same as the
stopword example...

On Thu, May 21, 2009 at 5:27 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Thu, May 21, 2009 at 5:19 PM, Earwin Burrfoot <ea...@gmail.com> wrote:
> >> Why not store an "actsAs" in the index, just for the changes that
> >> affect what's in the index?  Ie the index records the
> >> version that created it, and by default TokenStreams emulate their
> >> behavior as of that version?
> >
> > Because you don't always have access to index at the time you create
> > your TokenStreams?
>
> Such places would have to pass in their own actsAs when they ask the
> Analyzer for the tokenStream?
>
> Ie, the benefit of this approach vs the single global default is it'd
> be per-instance.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Robert Muir
rcmuir@gmail.com

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Thu, May 21, 2009 at 5:19 PM, Earwin Burrfoot <ea...@gmail.com> wrote:
>> Why not store an "actsAs" in the index, just for the changes that
>> affect what's in the index?  Ie the index records the
>> version that created it, and by default TokenStreams emulate their
>> behavior as of that version?
>
> Because you don't always have access to index at the time you create
> your TokenStreams?

Such places would have to pass in their own actsAs when they ask the
Analyzer for the tokenStream?

Ie, the benefit of this approach vs the single global default is it'd
be per-instance.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Earwin Burrfoot <ea...@gmail.com>.

> Why not store an "actsAs" in the index, just for the changes that
> affect what's in the index?  Ie the index records the
> version that created it, and by default TokenStreams emulate their
> behavior as of that version?
Because you don't always have access to index at the time you create
your TokenStreams?


-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Thu, May 21, 2009 at 4:34 PM, Shai Erera <se...@gmail.com> wrote:

> Changes to the index file formats need to be supported for 2 major releases. I.e. 2.X indexes need to be read by 3.Y code, but not by 4.0.

Agreed.

> Method deprecations last for one full minor release.

Your example confused me.  I think if in 2.1 we deprecate a method,
then in 2.2 we can remove it?  Or are you saying it's not until 2.3 that
we can remove it (2 full minor releases)?

> What I don't like about actsAs, and perhaps I just don't understand the proposal well, is that I'm not sure where it's added. Will it be added to IndexWriter, which will pass it on to all the classes it will meet/use?

We would add say oal.Versions class, that has a static actsAs
method and static constants.  If back-compat is vital to your app
you'd do:

  Versions.actsAs(Version.LUCENE_24)

on upgrading to 2.9.  Else, you do nothing to get "latest &
greatest".

You call this once in your app up front, and then use Lucene
normally.  Then, when IndexSearcher is asked to do field sorting, it
consults actsAs to decide whether it should do scoring or not.

> Changes to default behaviors, bug fixes or improvements, that may compromise the index structure or indexed data (such as InvalidAcronym) will last for at least one major release, if not 2

I think this harms new users unnecessarily (ie, I'd rather do actsAs
than this).  I'd like StopFilter to not discard positional
information, fixes for bugs in StandardAnalyzer, and
a correction to the default stopwords list, to be immediately available
for new users on the next release.

Why not store an "actsAs" in the index, just for the changes that
affect what's in the index?  Ie the index records the
version that created it, and by default TokenStreams emulate their
behavior as of that version?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Shai Erera <se...@gmail.com>.

I thought we were actually on the track towards not introducing any Settings
and/or actAs, but instead just change the policy?

Can we agree on the following:

* Changes to the index file formats need to be supported for 2 major
releases. I.e. 2.X indexes need to be read by 3.Y code, but not by 4.0.

* Method deprecations last for one full minor release. That is a deprecation
in 2.X lasts through 2.X.1, 2.X+1 but is removed in 2.X+2. If all those X's
are confusing --> a deprecation in 2.4 is kept in 2.4.X and 2.5, but we're
free to remove it in 2.6.

* Changes to default behaviors (whether they are bug fixes or improvements),
where it only affects runtime code, and not the index structure or indexed
data (such as the InvalidAcronym bug fix) are ok to go into any minor
release, w/o deprecation - so long we're documenting the change in CHANGES
along with some sample code on how to migrate easily.

* Changes to default behaviors, bug fixes or improvements, that may
compromise the index structure or indexed data (such as InvalidAcronym) will
last for at least one major release, if not 2 (just like supporting file
formats). The reason is - rebuilding indexes, besides that it might be a
heavy process, is not often acceptable by the customers of those who develop
search solutions. Therefore it may be out of our hands. Personally, I don't
think those will happen a lot, but when they will we can choose between:
(1) Deprecating a class entirely in favor of a new one, such that anyone who
upgrades can still use that class
(2) Introduce a static setter for that behavior, like for InvalidAcronym
(3) Add a actsAs to that class only.

Am I missing a back-compat issue?

What I don't like about actsAs, and perhaps I just don't understand the
proposal well, is that I'm not sure where it's added. Will it be added to
IndexWriter, which will pass it on to all the classes it will meet/use?

If I covered all the back-compat issues above, and we agree on them, then
for the first 3 we just need to document them on the back-compat page, no
code to develop.

For the last one, if we choose to adopt (1) or (2), then we don't need to
develop any mehcanism up-front, but decide on a per-case basis what's the
best alternative. For example, for the InvalidAcronym we could have
deprecated that particular TokenFilter in favor of a new one and give a code
example on how to create a TokenStream with that deprecated TokenFilter.

Shai

On Thu, May 21, 2009 at 10:55 PM, Jason Rutherglen <
jason.rutherglen@gmail.com> wrote:

> I'm having trouble visualizing the various methods people are talking
> about.  It seems like we could open an issue and post patches with code
> illustrating what each person is talking about?
>
> On Thu, May 21, 2009 at 10:02 AM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> Actually, we started with the *Settings classes (to hold defaults),
>> but then realized a simple actsAsVersion (single static method) would
>> suffice for just the back-compat settings and then pushed further and
>> thought perhaps we should relax our back-compat policy entirely so
>> emulating older versions is not needed.
>>
>> So we no longer have the "defaults" class (*Settings).  We may still
>> do it for the future (for its own benefits), but for just back-compat
>> of default settings, it seems like overkill.
>>
>> But I agree, the index altering cases are spooky.  I think this'd make
>> me favor going back to the actsAsVersion option instead of the hard
>> flip on our back compat policy (at least for default settings; for API
>> changes I think 1 whole minor release may be reasonable).
>>
>> Mike
>>
>> On Thu, May 21, 2009 at 12:54 PM, Matthew Hall
>> <mh...@informatics.jax.org> wrote:
>> > Sorry, I wasn't quite sure what to call this new class you guys have
>> been
>> > talking about.
>> >
>> > I was referring to the class that's being discussed to encapsulate all
>> of
>> > the defaults for a given lucene release.  (Its caching strategies etc
>> etc)
>> >
>> > I'm just not certain that something like a static list of words belongs
>> in a
>> > higher level defaults class like you guys are talking about, especially
>> > considering that anyone using a stop enabled analyzer really should be
>> > familiar with this list, and oftentimes needs to override it.
>> >
>> > Meh, now that I'm actually typing it out though, perhaps I'm incorrect
>> here,
>> > assuming this class you guys are describing will be well
>> > advertised/documented maybe it will actually make it easier for end
>> > developers to twiddle around with this list, or at least certainly make
>> them
>> > more aware that its even something that they have the ability to
>> actually
>> > change.
>> >
>> > Matt
>> >
>> > Michael McCandless wrote:
>> >>
>> >> What is the "lucene defaults class"?
>> >>
>> >> Mike
>> >>
>> >> On Thu, May 21, 2009 at 12:37 PM, Matthew Hall
>> >> <mh...@informatics.jax.org> wrote:
>> >>
>> >>>
>> >>> For extreme examples like this, couldn't the stopword list be
>> >>> encapsulated
>> >>> into a single class that's used by the lucene defaults class.
>> >>>
>> >>> That way if you folks released updates to mostly static content like a
>> >>> stopword list, new or old users could get it easily with a simple drop
>> in
>> >>> fix?
>> >>>
>> >>> Just my two cents.
>> >>>
>> >>> Matt
>> >>>
>> >>> Michael McCandless wrote:
>> >>>
>> >>>>
>> >>>> On Thu, May 21, 2009 at 12:19 PM, Robert Muir <rc...@gmail.com>
>> wrote:
>> >>>>
>> >>>>
>> >>>>>
>> >>>>> even as simple as changing default stopword list for some analyzer
>> >>>>> could
>> >>>>> be
>> >>>>> an issue, if the user doesn't re-index in response to that change.
>> >>>>>
>> >>>>>
>> >>>>
>> >>>> OK, right.
>> >>>>
>> >>>> So say we forgot to include "the" in the default English stopwords
>> >>>> list (yes, an extreme example...).
>> >>>>
>> >>>> Under the proposed changes 1 & 2 to back-compat policy, we would add
>> >>>> "the" to the default stopword list, so new users get the fix, but
>> >>>> still keep the the-less list accessible (deprecated).  We'd add an
>> >>>> entry in CHANGES.txt saying this happened, and then show code on how
>> >>>> to get back to the the-less stopword list.
>> >>>>
>> >>>> New users using that StopFilter would properly see "the" filtered
>> out.
>> >>>>  Users who upgraded would need to fix their code to switch back to
>> the
>> >>>> deprecated the-less list.
>> >>>>
>> >>>> Mike
>> >>>>
>> >>>> ---------------------------------------------------------------------
>> >>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>> --
>> >>> Matthew Hall
>> >>> Software Engineer
>> >>> Mouse Genome Informatics
>> >>> mhall@informatics.jax.org
>> >>> (207) 288-6012
>> >>>
>> >>>
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >>>
>> >>>
>> >>>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >>
>> >>
>> >
>> >
>> > --
>> > Matthew Hall
>> > Software Engineer
>> > Mouse Genome Informatics
>> > mhall@informatics.jax.org
>> > (207) 288-6012
>> >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>

Re: Lucene's default settings & back compatibility

Posted by Jason Rutherglen <ja...@gmail.com>.

I'm having trouble visualizing the various methods people are talking
about.  It seems like we could open an issue and post patches with code
illustrating what each person is talking about?

On Thu, May 21, 2009 at 10:02 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Actually, we started with the *Settings classes (to hold defaults),
> but then realized a simple actsAsVersion (single static method) would
> suffice for just the back-compat settings and then pushed further and
> thought perhaps we should relax our back-compat policy entirely so
> emulating older versions is not needed.
>
> So we no longer have the "defaults" class (*Settings).  We may still
> do it for the future (for its own benefits), but for just back-compat
> of default settings, it seems like overkill.
>
> But I agree, the index altering cases are spooky.  I think this'd make
> me favor going back to the actsAsVersion option instead of the hard
> flip on our back compat policy (at least for default settings; for API
> changes I think 1 whole minor release may be reasonable).
>
> Mike
>
> On Thu, May 21, 2009 at 12:54 PM, Matthew Hall
> <mh...@informatics.jax.org> wrote:
> > Sorry, I wasn't quite sure what to call this new class you guys have been
> > talking about.
> >
> > I was referring to the class that's being discussed to encapsulate all of
> > the defaults for a given lucene release.  (Its caching strategies etc
> etc)
> >
> > I'm just not certain that something like a static list of words belongs
> in a
> > higher level defaults class like you guys are talking about, especially
> > considering that anyone using a stop enabled analyzer really should be
> > familiar with this list, and oftentimes needs to override it.
> >
> > Meh, now that I'm actually typing it out though, perhaps I'm incorrect
> here,
> > assuming this class you guys are describing will be well
> > advertised/documented maybe it will actually make it easier for end
> > developers to twiddle around with this list, or at least certainly make
> them
> > more aware that its even something that they have the ability to actually
> > change.
> >
> > Matt
> >
> > Michael McCandless wrote:
> >>
> >> What is the "lucene defaults class"?
> >>
> >> Mike
> >>
> >> On Thu, May 21, 2009 at 12:37 PM, Matthew Hall
> >> <mh...@informatics.jax.org> wrote:
> >>
> >>>
> >>> For extreme examples like this, couldn't the stopword list be
> >>> encapsulated
> >>> into a single class that's used by the lucene defaults class.
> >>>
> >>> That way if you folks released updates to mostly static content like a
> >>> stopword list, new or old users could get it easily with a simple drop
> in
> >>> fix?
> >>>
> >>> Just my two cents.
> >>>
> >>> Matt
> >>>
> >>> Michael McCandless wrote:
> >>>
> >>>>
> >>>> On Thu, May 21, 2009 at 12:19 PM, Robert Muir <rc...@gmail.com>
> wrote:
> >>>>
> >>>>
> >>>>>
> >>>>> even as simple as changing default stopword list for some analyzer
> >>>>> could
> >>>>> be
> >>>>> an issue, if the user doesn't re-index in response to that change.
> >>>>>
> >>>>>
> >>>>
> >>>> OK, right.
> >>>>
> >>>> So say we forgot to include "the" in the default English stopwords
> >>>> list (yes, an extreme example...).
> >>>>
> >>>> Under the proposed changes 1 & 2 to back-compat policy, we would add
> >>>> "the" to the default stopword list, so new users get the fix, but
> >>>> still keep the the-less list accessible (deprecated).  We'd add an
> >>>> entry in CHANGES.txt saying this happened, and then show code on how
> >>>> to get back to the the-less stopword list.
> >>>>
> >>>> New users using that StopFilter would properly see "the" filtered out.
> >>>>  Users who upgraded would need to fix their code to switch back to the
> >>>> deprecated the-less list.
> >>>>
> >>>> Mike
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>>>
> >>>>
> >>>>
> >>>
> >>> --
> >>> Matthew Hall
> >>> Software Engineer
> >>> Mouse Genome Informatics
> >>> mhall@informatics.jax.org
> >>> (207) 288-6012
> >>>
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>>
> >>>
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>
> >>
> >
> >
> > --
> > Matthew Hall
> > Software Engineer
> > Mouse Genome Informatics
> > mhall@informatics.jax.org
> > (207) 288-6012
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

Actually, we started with the *Settings classes (to hold defaults),
but then realized a simple actsAsVersion (single static method) would
suffice for just the back-compat settings and then pushed further and
thought perhaps we should relax our back-compat policy entirely so
emulating older versions is not needed.

So we no longer have the "defaults" class (*Settings).  We may still
do it for the future (for its own benefits), but for just back-compat
of default settings, it seems like overkill.

But I agree, the index altering cases are spooky.  I think this'd make
me favor going back to the actsAsVersion option instead of the hard
flip on our back compat policy (at least for default settings; for API
changes I think 1 whole minor release may be reasonable).

Mike

On Thu, May 21, 2009 at 12:54 PM, Matthew Hall
<mh...@informatics.jax.org> wrote:
> Sorry, I wasn't quite sure what to call this new class you guys have been
> talking about.
>
> I was referring to the class that's being discussed to encapsulate all of
> the defaults for a given lucene release.  (Its caching strategies etc etc)
>
> I'm just not certain that something like a static list of words belongs in a
> higher level defaults class like you guys are talking about, especially
> considering that anyone using a stop enabled analyzer really should be
> familiar with this list, and oftentimes needs to override it.
>
> Meh, now that I'm actually typing it out though, perhaps I'm incorrect here,
> assuming this class you guys are describing will be well
> advertised/documented maybe it will actually make it easier for end
> developers to twiddle around with this list, or at least certainly make them
> more aware that its even something that they have the ability to actually
> change.
>
> Matt
>
> Michael McCandless wrote:
>>
>> What is the "lucene defaults class"?
>>
>> Mike
>>
>> On Thu, May 21, 2009 at 12:37 PM, Matthew Hall
>> <mh...@informatics.jax.org> wrote:
>>
>>>
>>> For extreme examples like this, couldn't the stopword list be
>>> encapsulated
>>> into a single class that's used by the lucene defaults class.
>>>
>>> That way if you folks released updates to mostly static content like a
>>> stopword list, new or old users could get it easily with a simple drop in
>>> fix?
>>>
>>> Just my two cents.
>>>
>>> Matt
>>>
>>> Michael McCandless wrote:
>>>
>>>>
>>>> On Thu, May 21, 2009 at 12:19 PM, Robert Muir <rc...@gmail.com> wrote:
>>>>
>>>>
>>>>>
>>>>> even as simple as changing default stopword list for some analyzer
>>>>> could
>>>>> be
>>>>> an issue, if the user doesn't re-index in response to that change.
>>>>>
>>>>>
>>>>
>>>> OK, right.
>>>>
>>>> So say we forgot to include "the" in the default English stopwords
>>>> list (yes, an extreme example...).
>>>>
>>>> Under the proposed changes 1 & 2 to back-compat policy, we would add
>>>> "the" to the default stopword list, so new users get the fix, but
>>>> still keep the the-less list accessible (deprecated).  We'd add an
>>>> entry in CHANGES.txt saying this happened, and then show code on how
>>>> to get back to the the-less stopword list.
>>>>
>>>> New users using that StopFilter would properly see "the" filtered out.
>>>>  Users who upgraded would need to fix their code to switch back to the
>>>> deprecated the-less list.
>>>>
>>>> Mike
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>
>>> --
>>> Matthew Hall
>>> Software Engineer
>>> Mouse Genome Informatics
>>> mhall@informatics.jax.org
>>> (207) 288-6012
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
>
> --
> Matthew Hall
> Software Engineer
> Mouse Genome Informatics
> mhall@informatics.jax.org
> (207) 288-6012
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Earwin Burrfoot <ea...@gmail.com>.

> That bug has led to 'base' having a compromised reputation among elite users
> because of intermittent, inexplicable flakiness.  Is that what you want for
> Lucene?
While I agree with that point, Lucene already has lots and lots of
static configuration.
Having actsAsVersion won't add any new woes. Well, it won't remove them either.

I dislike both single static variable approach and Settings object you
have to pass into each end every constructor.
If I absolutely must choose and have no right to just throw all these
back-compat crutches out of the window, I'd choose a single static
variable.

On the offnote, it's amusing to see how the discussion is calming down
and will seemingly end in minor amendmends to the policy, if any at
all :)

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

OK it sounds like a single global actsAsVersion is too problematic.

So how about, for cases where back compat default settings are
important (analyzers, query scoring changes, etc.) we add
actsAsVersion as a mandatory ctor argument to those classes
(deprecating the other ctors)?  We would do this on demand, the first
time a class needs to change its default behavior.

For example, in 2.9 we'd like sorting by field to not return score by
default.  So, we'd add actsAsVersion to IndexSearcher's ctors, and
IndexSearcher then looks at the version it should emulate and sets the
defaults correctly.

New users would not use the deprecated API, and would pass
Versions.LUCENE_LATEST.  Existing users on upgrading would see that
they need to explicitly set their compatibility level (and, we'd default
it to the last version so that back-compat users don't see anything
break on upgrading).

Mike

On Fri, May 22, 2009 at 9:45 AM, Grant Ingersoll <gs...@apache.org> wrote:
> Perhaps it is wise to take a step back before we play all of these "what if"
> games...
>
> I think the best way forward is to simply ask ourselves, when confronted
> with an actual issue, is what is the cost of back compat. for this issue and
> then address it on a case by case basis, with a bias towards maintaining
> back compat if it is not too burdensome.  Too burdensome is a judgment call
> for the contributors and committers.
>
> As for the Settings vs. static actAs stuff, I really am on the fence.  They
> both have their downsides, so I'm inclined to punt.  Frankly, I think if
> someone wants 2.4.1 functionality and we're on 2.9 or even 3.0, but some of
> the new features available on 2.9, then they should backport the patches.  I
> don't think the burden should be on us to have the trunk support every
> single setting that was ever available on a given 2.x release given the time
> frames we operate on  The fact is, we are obsessing over the name of the
> release, when the more important factor is the time it takes to make the
> release.  If we released once a month, I'd be inclined otherwise, but for
> the reality we are in, I'm almost ready we to say we should just chuck the
> whole major minor thing and say we go the MS way:  Lucene 2009 and then have
> service pack releases for just that year's major release (I realize, of
> course, they likely have internal versions, etc. but maybe not)
>
> -Grant
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Grant Ingersoll <gs...@apache.org>.

Perhaps it is wise to take a step back before we play all of these  
"what if" games...

I think the best way forward is to simply ask ourselves, when  
confronted with an actual issue, is what is the cost of back compat.  
for this issue and then address it on a case by case basis, with a  
bias towards maintaining back compat if it is not too burdensome.  Too  
burdensome is a judgment call for the contributors and committers.

As for the Settings vs. static actAs stuff, I really am on the fence.   
They both have their downsides, so I'm inclined to punt.  Frankly, I  
think if someone wants 2.4.1 functionality and we're on 2.9 or even  
3.0, but some of the new features available on 2.9, then they should  
backport the patches.  I don't think the burden should be on us to  
have the trunk support every single setting that was ever available on  
a given 2.x release given the time frames we operate on  The fact is,  
we are obsessing over the name of the release, when the more important  
factor is the time it takes to make the release.  If we released once  
a month, I'd be inclined otherwise, but for the reality we are in, I'm  
almost ready we to say we should just chuck the whole major minor  
thing and say we go the MS way:  Lucene 2009 and then have service  
pack releases for just that year's major release (I realize, of  
course, they likely have internal versions, etc. but maybe not)

-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Matthew Hall <mh...@informatics.jax.org>.

Earwin Burrfoot wrote:
>
> As I said, my app uses around ten indexes, which one should I use? :)
>
>   
Even more here, this would be a reasonably painful solution for us.

Matt


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Earwin Burrfoot <ea...@gmail.com>.

> A funny thought: we can give those methods/classes really stupid/nasty names, to emphasize the beauty of the existing API, to encourage people to stick with the better API :)
I believe I've seen google using internally names like
thisisbadbadbadInstanceMap. :)

> One thing we didn't address here fully are methods added to
> interfaces/abstract classes. When we add a method to an abstract class with
> a default impl, that's ok. But what if we need to make it abstract (like we
> had to do in 1575 for the Collector versions)?
If we adopt such relaxed back-compat policies, we'll no longer need
that abstract-class-as-an-interface craziness. If I remember well,
major point of using abstract VS interfaces, is that you can add
methods on user-extensible things and be jar drop-in compatible.

> If we could have the app saying something
> Version.getInstance(appId).actAsVersion(2.4) that would solve it because
> each will have its own Id, and the Version class would maintain a map
> between the Id and an instance. But I've still yet to resolve (in my mind)
> how the Lucene code will use it, since the same code runs in two apps with
> different IDs, and so won't know which appId to pass.
There's no practical difference between passing your version
everywhere, or your appId.. well.. everywhere.

> What if we continue to process Marvin's proposal on saving that information
> in the index. I think, Mike, that I asked you a similar question a while
> ago, about whether Lucene has the ability to store index versions. Index
> versions are important and can save some of the problems here - not just
> with storing stopwords list, but also code that manipulates the index, or
> makes decisions about scoring etc.
Storing metadata in the index is good idea by itself, but I believe
that should be done at your app level, not down inside Lucene. After
all, we all need different metadata.
There's that recently added commitUserData - very cool, even if I
don't need it. Would be nice to have indexUserData, which is preserved
across commits and has read/write methods.

> Arggh .. but again we face the same problem - how do we pass that
> information to the different classes? How is a TokenStream expected to read
> that info?
Yeah, absolutely, satellite lucene classes should not be bound to
physical instance of the index. I use the same analyzer to process
data for something like ten indexes. Which version number is going to
be used?
So we have to get the number from index manually and then pass it to
the classes in question. Which is absolutely the same as passing them
some version constant, or your appId.

> I think we may have to settle on the static Version class, even if it will
> read the information from the index (by doing some Version.init(File
> indexDir)).
As I said, my app uses around ten indexes, which one should I use? :)

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Shai Erera <se...@gmail.com>.

>
> Your example confused me.

You're right. I Wrote it with one eye closed already. I meant to say that if
I'm a 2.4 user and something gets deprecated in trunk (afterwards), it is
carried through 2.4.X and 2.5 and then removed in 2.6. So only 1 full minor
release.

It's somewhat crazy, but what if we deprecate stuff and rename it?
>

I absolutely love that idea ! But it means that:
1) We cannot support jar drop-in ability in those cases (which I'm fine with
because people can upgrade to 2.4.X to get bug fixes) not just because the
API does something different, but because it may not compile. For example,
the changes I'm doing in 1614 would have changed next() and skipTo()
signature, and so someone who wrote a DISI which has a next() that returns
boolean will fail to compile.
2) We give the deprecated API the mediocre names. (A funny thought: we can
give those methods/classes really stupid/nasty names, to emphasize the
beauty of the existing API, to encourage people to stick with the better API
:) ).
3) We document clearly what needs to be done in order to use the deprecated
API.

One thing we didn't address here fully are methods added to
interfaces/abstract classes. When we add a method to an abstract class with
a default impl, that's ok. But what if we need to make it abstract (like we
had to do in 1575 for the Collector versions)?

I guess for interfaces we should first move all of them to abstract classes.
I like interfaces. but abstract classes give us slightly more freedom when
we face back-compat issues. Maybe to support Earwin's idea, we use the name
for a new abstract class, and give the interface a different name? That way
to upgrade people just need to change implements to extends (I hope that
won't cause any problems if their classes already extend something else).

But if we apply this policy to interfaces, I think more users will need to
touch their code when upgrading even minor releases.

So Mike, about actsAsVersion ... I think I'm starting to get used to it. I
do relate to what Marvin writes though, about two different apps running in
the same JVM with different settings. We have such a case - two teams
develop two search solutions (for two back-ends). They live in the same JVM
but have different development plans/schedules. So it's not just a
hypothetical problem to me.

If we could have the app saying something
Version.getInstance(appId).actAsVersion(2.4) that would solve it because
each will have its own Id, and the Version class would maintain a map
between the Id and an instance. But I've still yet to resolve (in my mind)
how the Lucene code will use it, since the same code runs in two apps with
different IDs, and so won't know which appId to pass.

Oh well .. we're going to change the way those two teams work anyway, so for
me at least, this problem will be gone soon :)

I also agree that actsAsVersion breaks the localilty principle, in which
when you see a bug you should check in the surroundings where the bug
happened, and not realize the bug stems from files away. But I don't like
passing version information in the constructors also ...

What if we continue to process Marvin's proposal on saving that information
in the index. I think, Mike, that I asked you a similar question a while
ago, about whether Lucene has the ability to store index versions. Index
versions are important and can save some of the problems here - not just
with storing stopwords list, but also code that manipulates the index, or
makes decisions about scoring etc.

For the two apps in same JVM it should solve the problem since I think we
can safely assume each operates on its own index.

Arggh .. but again we face the same problem - how do we pass that
information to the different classes? How is a TokenStream expected to read
that info?

I think we may have to settle on the static Version class, even if it will
read the information from the index (by doing some Version.init(File
indexDir)).

Shai

On Fri, May 22, 2009 at 1:53 AM, Marvin Humphrey <ma...@rectangular.com>wrote:

> On Thu, May 21, 2009 at 05:19:43PM -0400, Michael McCandless wrote:
>
> > Marvin, which solution would you prefer?
>
> Between the two, I'd prefer settings constructor arguments, though I would
> be
> inclined to have settings classes that are specific to individual classes
> rather than Lucene-wide.
>
> At least that scheme gets locality right.  The global actsAsVersion
> variable
> violates that principle and has the potential to saddle a small number of
> users who have done absolutely nothing wrong with bugs that are very, very
> hard to hunt down.  That's unfair.
>
> As far as analyzers and token streams, the theoretical answer is making
> indexes self-describing via serializable schemas, as discussed on the Lucy
> dev
> list, and as implemented in KinoSearch svn trunk.  With versioning metadata
> attached to the index, there is no longer any worry about upgrading
> analysis
> modules provided that those modules handle their own versioning correctly.
>
> For instance, in KS the Stopalizer always embeds the complete stoplist in
> the
> schema file, so even if we update the "English" stoplist, we don't get
> invalid
> search results for indexes which were created with the old stoplist.
> Similarly, it may not be possible to keep around multiple variants of
> Snowball, but at least we can fail catastrophically instead of subtly if we
> detect that the Snowball version has changed.
>
> Full-on schema serialization isn't feasible for Lucene, but attaching an
> actsAsVersion variable to an index and feeding that to your analyzers would
> be
> a decent start.
>
> Lastly, I think a major java Lucene release is justified already.  Won't
> this
> discussion die down somewhat if you can get 3.0 out?  If there are issues
> that
> are half done, how about rolling back whatever's in the way?
>
> Marvin Humphrey
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Fri, May 22, 2009 at 12:44 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> I'm not a lawyer, so I dislike trying to nail down every detail in
> writing and try to solve future problems in the abstract.

Agreed, and there's always leeway in what we work out here
(LUCENE-1436 is a good recent example), but I think working out broad
guidelines for us to follow is still worthwhile.  EG I think the
existing guidelines have served us well (in that it's something to
follow when working on changes).

> Lucene has never really been 100% back compatible... we've just tried
> to keep it that way... it's more of a mindset than a reality, and I'm
> wary of changing that mindset too much.  Lucene has benefited from API
> and design stability, and I think the bar should be kept high for
> changes (i.e. there should be clear benefits).
>
> Anyway, I think substantially relaxing back compat requirements is
> enough of a change that it should at some point go to a vote (once
> people figure out exactly what is being voted on ;-)

Definitely, if we can actually figure out what to vote on, we should
vote on this change...

> That doesn't apply to a static actsAsVersion that would preserve back
> compatibility by default of course.

Actually I was wanting by default to *not* preserve back compat.  Ie,
new users see Lucene's latest & greatest for free; old users must set
back-compate level.

> Depending on the specifics, it may often be simpler/cleaner to create
> a new class / constructor and deprecate the old, as we do now.

True, and actually this is a viable workaround/fallback if we can't
otherwise come to agreement, to let new users see the best of Lucene
by default.

So eg we could deprecate:

  public TopFieldDocs search(Query query, Filter filter, int n,
                             Sort sort)

and add:

  public TopFieldDocs search(Query query, Filter filter, int n,
                             Sort sort, boolean includeScores)

(or something along those lines).

Or, deprecate IndexReader.open in favor of open(boolean readOnly).

It's not an unreasonable approach, in that whenever there is a setting
that needs to change in a given release, we simply make it explicit.

Though this is more awkward for bug fixes to StandardAnalyzer (for
example).  What would the new ctor look like?  I guess you'd pass in
the "invalidAcronyms" to the ctor.

>>  4. [Maybe?] Allow certain limited changes that will require source
>>     code changes in your app on upgrading to a new minor release:
>>     adding a new method to an interface, adding a new abstract method
>>     to an abstract class, renaming of deprecated methods.
>
> +1, depending on the specifics.  This is where back compat rules
> shouldn't be cast in stone.
> There are some public classes in Lucene that are really just
> implementation artifacts - pretty much no one will directly use those
> classes and changes to those shouldn't be a big deal.

Right.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Yonik Seeley <yo...@lucidimagination.com>.

I'm not a lawyer, so I dislike trying to nail down every detail in
writing and try to solve future problems in the abstract.

Lucene has never really been 100% back compatible... we've just tried
to keep it that way... it's more of a mindset than a reality, and I'm
wary of changing that mindset too much.  Lucene has benefited from API
and design stability, and I think the bar should be kept high for
changes (i.e. there should be clear benefits).

Anyway, I think substantially relaxing back compat requirements is
enough of a change that it should at some point go to a vote (once
people figure out exactly what is being voted on ;-)
That doesn't apply to a static actsAsVersion that would preserve back
compatibility by default of course.

>  3. Default settings can change, but if the change is big enough (and
>     certainly if it will impact what's indexed or how searches find
>     docs/do scoring), we add a required "actsAsVersion" arg to the
>     ctor of the affected class.  New users get the latest & greatest,
>     and upgraded users keep their old defaults.

If we get to the point of passing something around, it might as well
be a Settings object, unless it's an inner loop efficiency thing.
Depending on the specifics, it may often be simpler/cleaner to create
a new class / constructor and deprecate the old, as we do now.

>  4. [Maybe?] Allow certain limited changes that will require source
>     code changes in your app on upgrading to a new minor release:
>     adding a new method to an interface, adding a new abstract method
>     to an abstract class, renaming of deprecated methods.

+1, depending on the specifics.  This is where back compat rules
shouldn't be cast in stone.
There are some public classes in Lucene that are really just
implementation artifacts - pretty much no one will directly use those
classes and changes to those shouldn't be a big deal.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by DM Smith <dm...@gmail.com>.

Earwin Burrfoot wrote:
>
>>  4. [Maybe?] Allow certain limited changes that will require source
>>     code changes in your app on upgrading to a new minor release:
>>     adding a new method to an interface, adding a new abstract method
>>     to an abstract class, renaming of deprecated methods.
>>     
> Yahoo! The right to rename deprecated things makes the need to
> deprecate VS simply remove bearable.

I've also noticed the ugly name problem. I would be in favor of a 
cleanup of ugly names.

Using the existing policy mechanism, one could (I haven't thought this 
through):

In 3.0, remove the deprecations.

Do a 3.9 release with:
a) add methods and classes with the good names. These should be an exact 
copy of the ugly named code.
b) deprecate the ugly names.
c) no other changes.

Release 4.0 with deprecations removed.

These three releases could happen simultaneously.

(Of course, if we want to do this, we could have a policy that we have a 
2.9.0 and an 2.9.1 (rather than 3.9) followed by a 3.0 with good names.)

Now we are back to good names. And drifting can start all over again.

-- DM

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Earwin Burrfoot <ea...@gmail.com>.

>  1. If we deprecate an API in the 2.1 release, we can remove it in
>     the next minor release (2.2).
Agree. Maybe also this?
1a. If deprecated functionality is trivially implemented with new one,
we reserve the right to delete deprecated things right away with
appropriate CHANGES note.
Sample I:
getOmitTF() is deprecated in favor of
getOmitTermFrequenciesAndPositions() which is absolutely identical to
the former, save for the name
getOmitTF is removed, CHANGES.txt contains a line telling the user to
do search/replace on his code

Sample II:
score(HitCollector, int) is deprecated in favor of score(Collector, int)
score(HitCollector, int) is removed, CHANGES.txt contains a line
telling the used to wrap his collector with new HitCollectorWrapper()
or reimplement

>  2. JAR drop-in-ability is only guaranteed on point releases (2.4.1
>     is a drop-in replacement to 2.4.0).  When switching to a new
>     minor release (2.1 -> 2.2) likely you'll need to recompile.
Agree.

>  3. Default settings can change, but if the change is big enough (and
>     certainly if it will impact what's indexed or how searches find
>     docs/do scoring), we add a required "actsAsVersion" arg to the
>     ctor of the affected class.  New users get the latest & greatest,
>     and upgraded users keep their old defaults.
What about a default value for actsAs? I mean, if I agree with using
Version.LATEST_AND_GREATEST, why do I have to explicitly set it every
time?

>  4. [Maybe?] Allow certain limited changes that will require source
>     code changes in your app on upgrading to a new minor release:
>     adding a new method to an interface, adding a new abstract method
>     to an abstract class, renaming of deprecated methods.
Yahoo! The right to rename deprecated things makes the need to
deprecate VS simply remove bearable.

One more thing. What about adopting fixed release schedule?
I.e. minor releases are done strictly each three/four/six months.
Whatever is finished - goes into release, whatever is not - continues
rotting as patches in jira.
Either this, or my addition to rule #1 has to be made, or we're going
to wait for two years before removing something. First year until a
minor release that declares deprecation, another year until next minor
release that actually removes garbage.

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Mark Miller <ma...@gmail.com>.

Right - I'd actually hold off now. I figured the threat of sending might 
prompt some action ;)

It still wouldn't hurt to know what the users think, perhaps at more 
digestible, overview level though.

I do think Yonik torpedoed something this liberal :)

Thats not a bad thing though. We will find the right answer somewhere 
between the two of you I hope.

We may already be at some half way point - we have experimental apis and 
exceptions at an ever growing rate.

As you also mention, as more of the code moves to abstract classes, back 
compat is eased anyway.

- Mark

Shai Erera wrote:
> Well .. to be honest I haven't monitored java-user for quite some 
> time, so I don't know if it hasn't been raised there.
>
> But now there's the other thread that Yonik started, so I'm not really 
> sure where to answer.
>
> I think that if we look back at 2.0 and compare to 2.9, anyone 
> upgrading from that version to 2.9 is going to need to learn a lot 
> about Lucene. It's not just deprecation, but best practices, different 
> approaches for different situations etc. For example, 
> ConstantScoreQuery is not a *default* thing - I need to know it exists 
> and what benefits does it give me, in order to use it. So no 
> back-compat / deprecation stuff would teach me how to use it. Nor will 
> I miraculaously understand that I'd better not score when sorting. 
> Yes, the API has changed, but not in a way I now can understand it. 
> Maybe we've documented it well, dunno ...
>
> If people upgrade from 2.0 to 2.9, then their lives would be a lot 
> easier if 2.9 provided the greatest and latest right out-of-the-box. 
> So yes, they'd need to fix all the deprecations, but that's easy 
> because we document the alternative. Add that to the "best defaults" 
> and we've got a good code migration story.
>
> Again, as long as we release every ~6 months (and I don't think we 
> should release sooner), I don't think it's such a problem to request 
> someone to make minor modifications/maintenance to his code every 
> 1year (!). Especially since we believe a major release will come every 
> ~2 years, by which I need to re-build my indices, which is by far a 
> more costly operation (sometimes out of your hands) than updating code.
>
> So relaxing the back-compat a bit overall does not seem like a great 
> "crime against the Lucene users" to me - all is done (>98% of the 
> time?) for the better.
>
> But maybe these days will pass soon. If we continue to get rid of 
> interfaces and adopt abstract classes, perhaps we won't work too hard 
> to improve things. In 1614 it was quite easy to improve DISI since it 
> is an abstract class.
>
> Shai
>
> On Wed, Jun 10, 2009 at 7:32 PM, Mark Miller <markrmiller@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     No one really responded to this Shai? And I take it that the user
>     list never saw it?
>
>     Perhaps we should just ask for opinion from the user list based on
>     what you already have - just to gauge the reaction on different
>     points. Unless someone responds shortly, we could take a year
>     waiting to shake it out.
>     The threat of sending should prompt anyone with any issues to
>     speak up.
>
>     I think we should add though:
>     explicitly what has changed (eg if we switch something, what was
>     the policy before - most users won't even know)
>     an overview of why we are interested in relaxing back compat
>
>     - Mark
>
>     Shai Erera wrote:
>
>         Ok, so digging back in this thread, I think the following
>         proposals were made (if I missed some, please add them):
>
>         1. API deprecation last *at least* one full minor release.
>         Example: if we deprecate an API in 2.4, we can remove it in
>         2.5. BUT, we are also free to keep it there and remove it in
>         2.6, 2.9, 3.0, 3.5. I would like to reserve that option for
>         controversial deprecations, like TokenStream, and maybe even
>         the HitCollector recent changes. Those that we feel will have
>         a large impact on the users, we might want to keep around for
>         a bit longer until we get enough feedback from the field and
>         are more confident with that change.
>
>         2. Bugs are fixed backwards on the last "dot" release only.
>         Example, A bug that's discovered after 2.4 is released, is
>         fixed on 2.4.X branch. Once 2.5 is released, any bug fixes
>         happen on trunk and 2.5.X. A slight relaxation would be adding
>         something like "we may still fix bugs on the 2.4.X branch if
>         we feel it's important enough". For example if 2.5 contains a
>         lot of API changes and we think a considerable portion of our
>         users are still on 2.4.
>
>         3. Jar drop-in ability is only guaranteed on point releases
>         (this is slightly of an outcome of (1) and (2), but (6) will
>         also affect it).
>
>         4. Changes to the index format last at least one full major
>         release. Example: a change to the index format in 2.X, is
>         supported in all 3.Y releases, and removed in 4.0. Again, I
>         write "at least" since we should have the freedom to extend
>         support for a particular change.
>
>         5. Changes to the default settings are allowed between minor
>         releases, provided that we give the users a way to revert back
>         to the old behavior. Examples are LUCENE-1542 and the latest
>         issues Mike opened. Those changes will be applied
>         out-of-the-box. The provided API to revert to the old behavior
>         may be a supported API, or a deprecated API. For deprecation
>         we can decide to keep the API longer than one minor release.
>
>         5.1) An exception to (5) are bug fixes which break back-compat
>         - those are always visible, w/ a way to revert to the buggy
>         behavior. That way may be deprecated or not, and its support
>         lifetime can be made on a case-by-case basis.
>
>         6. Minor changes to APIs can happen w/o any deprecation.
>         Example, LUCENE-1614, adding 1/2 methods to an interface with
>         a good documentation and trivial proposal for implementation etc.
>
>         You will notice that almost every proposal has a "we may
>         decide to keep it for longer" - I wrote it following one of
>         the early responses on this thread (I think it was Grant's) -
>         we should not attempt to set things in stone. Our back-compat
>         policy should ensure some level of SLA to our users, but
>         otherwise we should not act as robots, and if we think a
>         certain case requires a different handling than the policy
>         states (only for the user's benefit though), it should be done
>         that way. The burden is still put on the committers, only now
>         the policy is relaxed a bit, and handles different cases in
>         different ways, and the committers/contributors don't need to
>         feel that their hands are tied.
>
>         These set the ground/basis, but otherwise we should decide on
>         a case-by-case basis on any extension/relaxation of the
>         policy, for our users' benefits. After quite some time I've
>         been following the discussions on this mailing list, I don't
>         remember ever seeing an issue being driven against our users'
>         benefit. All issues attempt to improve Lucene's performance
>         and our users' experience (end users as well as search
>         application developers). I think it's only fair to ask this
>         "users" community be more forgiving and open to make changes
>         on their side too, making the life of the
>         committers/contributors a bit easier.
>
>         I also agree that the next step would be taking this to
>         java-user and get a sense of whether our "users" community
>         agree with those changes or not. I hope that the above summary
>         captures what's needed to be sent to this list.
>
>         Shai
>
>         On Sat, May 30, 2009 at 2:21 PM, Michael McCandless
>         <lucene@mikemccandless.com <ma...@mikemccandless.com>
>         <mailto:lucene@mikemccandless.com
>         <ma...@mikemccandless.com>>> wrote:
>
>            Actually, I think this is a common, and in fact
>         natural/expected
>            occurrence in open-source.  When a tricky topic is
>         discussed, and the
>            opinions are often divergent, frequently the conversation never
>            "converges" to a consensus and the discussion dies.  Only if
>            discussion reaches a semblance of consensus do we vote on it.
>
>            It's exactly like what happens when a controversial bill
>         tries to go
>            through the US congress.  It's heavily discussed and then
>         dies off
>            from lack of consensus, or, it gets far enough to be voted on.
>
>            Ie, this is completely normal for open source.
>
>            We may not like it, we may consider it inefficient, annoying,
>            frustrating, whatever, but this is in fact a reality of all
>         healthy
>            open-source projects.
>
>            Consensus building is not easy, and if the number of people
>         trying to
>            build consensus, by iterating on the proposal, compromising,
>            suggesting alternatives when others dislike an approach,
>         etc., is
>            dwarfed by the number of people objecting to the proposal, then
>            consensus never emerges.
>
>            In this case specifically, I had a rather singular goal:
>         the freedom
>            to make changes to defaults inside Lucene to always favor
>         new users,
>            while not hurting back-compat users.  I intentionally
>         proposed no
>            changes to our back-compat policy (knowing reaching
>         consensus would be
>            that much more difficult).
>
>            The proposal went through several iterations (*settings,
>            *actsAsVersion, etc) that all failed to reach consensus, so
>         we settled
>            back on the current approach of "make the setting explicit"
>         which is
>            an OK workaround.  One by one I've been doing that for the
>         original
>            examples I listed (readOnly IndexReader, NIOFSDir default,
>         etc.)
>
>            But, then, the conversation shifted to a different topic
>         ("how to
>            relax our back-compat policy"), which also failed to reach
>         consensus.
>
>            Maybe, the best way forward is to break out each of the
>         separate
>            bullets and discuss them separately?
>
>            Mike
>
>            On Fri, May 29, 2009 at 11:22 PM, Shai Erera
>         <serera@gmail.com <ma...@gmail.com>
>            <mailto:serera@gmail.com <ma...@gmail.com>>> wrote:
>            > So ... I've this happen a lot of times (especially in my
>         thesis
>            work) -
>            > someone raises a controversial topic, or one that touches the
>            nervous of the
>            > system, there's a flurry of activity and then it dies
>            unexpectedly, even
>            > though it feels to everyone that there's "an extra mile" that
>            should be
>            > taken in order to bring it to completion.
>            >
>            > And that's what I've seen in this thread. A lot has been
>         said -
>            lots of
>            > comments, ideas, opinions. Lots of ranting and
>         complaining. Then
>            it died ...
>            > Thank you Grant for that last "beep", I hope that was an
>            intention to
>            > resurrect it.
>            >
>            > So I ask - how come that we don't have a decision? Is it
>         because
>            we're
>            > "afraid" to make a decision? (that last sentence is
>         supposed to
>            "tease" the
>            > community, not to pass judgement)
>            >
>            > I'm asking because it seems like everybody pretty much
>         agrees on
>            most of the
>            > suggestions, so why not decide "let's do X, Y and Z" and
>         change the
>            > back-compat page starting from 2.9? If people don't
>         remember the
>            decisions,
>            > I don't mind reiterating them.
>            >
>            > (I also ask because I'd like to take the improvements from
>            LUCENE-1614 to
>            > TermDocs/Positions, PhrasePositions, Spans. All except
>            PhrasePositions are
>            > public interfaces and so it matters if I need to go
>         through creating
>            > abstract classes, with new names, or I can change those
>            interfaces, asking
>            > those that implemented their own TermDocs to modify the
>         code).
>            >
>            > Shai
>            >
>            > On Wed, May 27, 2009 at 10:36 PM, Grant Ingersoll
>            <gsingers@apache.org <ma...@apache.org>
>         <mailto:gsingers@apache.org <ma...@apache.org>>>
>            > wrote:
>            >>
>            >> So, here's a real, concrete example of the need for case by
>            case back
>            >> compat.  See
>         https://issues.apache.org/jira/browse/LUCENE-1662
>            >>
>            >> It's completely stupid that ExtendedFieldCache even
>         exists.      It is a dumb
>            >> workaround for a made up problem that has nothing to do with
>            real coders
>            >> living in the modern age of development where IDE's make
>            refactoring these
>            >> types of things very cheap.  Namely, the notion that
>         interfaces
>            must never
>            >> change lest every 6-9 months some minute number of users
>         (I'd
>            venture it's
>            >> less than 1% of users) out there, who by any account are
>            completely capable
>            >> of implementing hard core Lucene internals (like extending
>            FieldCache), yet
>            >> are seemingly incapable of reading a CHANGES file with a
>         huge
>            disclaimer in
>            >> it, have to recompile (GASP!) their code and put in a dummy
>            implementation
>            >> of some new interface method.  Yet, here we are with Yonik
>            fixing very real
>            >> problems that are a direct result of coding around back
>         compat.
>            (along with
>            >> a mistake; it took a long time for this issue to even be
>            discovered) that
>            >> very much effect the usability of Lucene and the day to day
>            experience of a
>            >> good number of users.
>            >>
>            >> In other words, the real fix for L-1662 is for
>         ExtFieldCache to
>            be folded
>            >> into FieldCache and for the file to be removed, never to be
>            heard from
>            >> again.
>            >>
>            >> The same can be said for the whole Fieldable issue, but
>         that's
>            a different
>            >> day.
>            >>
>            >> Ranting,
>            >> Grant
>            >>
>            >>
>          
>          ---------------------------------------------------------------------
>            >> To unsubscribe, e-mail:
>         java-dev-unsubscribe@lucene.apache.org
>         <ma...@lucene.apache.org>
>            <mailto:java-dev-unsubscribe@lucene.apache.org
>         <ma...@lucene.apache.org>>
>            >> For additional commands, e-mail:
>            java-dev-help@lucene.apache.org
>         <ma...@lucene.apache.org>
>            <mailto:java-dev-help@lucene.apache.org
>         <ma...@lucene.apache.org>>
>            >>
>            >
>            >
>
>          
>          ---------------------------------------------------------------------
>            To unsubscribe, e-mail:
>         java-dev-unsubscribe@lucene.apache.org
>         <ma...@lucene.apache.org>
>            <mailto:java-dev-unsubscribe@lucene.apache.org
>         <ma...@lucene.apache.org>>
>            For additional commands, e-mail:
>         java-dev-help@lucene.apache.org
>         <ma...@lucene.apache.org>
>            <mailto:java-dev-help@lucene.apache.org
>         <ma...@lucene.apache.org>>
>
>
>
>
>     -- 
>     - Mark
>
>     http://www.lucidimagination.com
>
>
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     For additional commands, e-mail: java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>
>


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Shai Erera <se...@gmail.com>.

Well .. to be honest I haven't monitored java-user for quite some time, so I
don't know if it hasn't been raised there.

But now there's the other thread that Yonik started, so I'm not really sure
where to answer.

I think that if we look back at 2.0 and compare to 2.9, anyone upgrading
from that version to 2.9 is going to need to learn a lot about Lucene. It's
not just deprecation, but best practices, different approaches for different
situations etc. For example, ConstantScoreQuery is not a *default* thing - I
need to know it exists and what benefits does it give me, in order to use
it. So no back-compat / deprecation stuff would teach me how to use it. Nor
will I miraculaously understand that I'd better not score when sorting. Yes,
the API has changed, but not in a way I now can understand it. Maybe we've
documented it well, dunno ...

If people upgrade from 2.0 to 2.9, then their lives would be a lot easier if
2.9 provided the greatest and latest right out-of-the-box. So yes, they'd
need to fix all the deprecations, but that's easy because we document the
alternative. Add that to the "best defaults" and we've got a good code
migration story.

Again, as long as we release every ~6 months (and I don't think we should
release sooner), I don't think it's such a problem to request someone to
make minor modifications/maintenance to his code every 1year (!). Especially
since we believe a major release will come every ~2 years, by which I need
to re-build my indices, which is by far a more costly operation (sometimes
out of your hands) than updating code.

So relaxing the back-compat a bit overall does not seem like a great "crime
against the Lucene users" to me - all is done (>98% of the time?) for the
better.

But maybe these days will pass soon. If we continue to get rid of interfaces
and adopt abstract classes, perhaps we won't work too hard to improve
things. In 1614 it was quite easy to improve DISI since it is an abstract
class.

Shai

On Wed, Jun 10, 2009 at 7:32 PM, Mark Miller <ma...@gmail.com> wrote:

> No one really responded to this Shai? And I take it that the user list
> never saw it?
>
> Perhaps we should just ask for opinion from the user list based on what you
> already have - just to gauge the reaction on different points. Unless
> someone responds shortly, we could take a year waiting to shake it out.
> The threat of sending should prompt anyone with any issues to speak up.
>
> I think we should add though:
> explicitly what has changed (eg if we switch something, what was the policy
> before - most users won't even know)
> an overview of why we are interested in relaxing back compat
>
> - Mark
>
> Shai Erera wrote:
>
>> Ok, so digging back in this thread, I think the following proposals were
>> made (if I missed some, please add them):
>>
>> 1. API deprecation last *at least* one full minor release. Example: if we
>> deprecate an API in 2.4, we can remove it in 2.5. BUT, we are also free to
>> keep it there and remove it in 2.6, 2.9, 3.0, 3.5. I would like to reserve
>> that option for controversial deprecations, like TokenStream, and maybe even
>> the HitCollector recent changes. Those that we feel will have a large impact
>> on the users, we might want to keep around for a bit longer until we get
>> enough feedback from the field and are more confident with that change.
>>
>> 2. Bugs are fixed backwards on the last "dot" release only. Example, A bug
>> that's discovered after 2.4 is released, is fixed on 2.4.X branch. Once 2.5
>> is released, any bug fixes happen on trunk and 2.5.X. A slight relaxation
>> would be adding something like "we may still fix bugs on the 2.4.X branch if
>> we feel it's important enough". For example if 2.5 contains a lot of API
>> changes and we think a considerable portion of our users are still on 2.4.
>>
>> 3. Jar drop-in ability is only guaranteed on point releases (this is
>> slightly of an outcome of (1) and (2), but (6) will also affect it).
>>
>> 4. Changes to the index format last at least one full major release.
>> Example: a change to the index format in 2.X, is supported in all 3.Y
>> releases, and removed in 4.0. Again, I write "at least" since we should have
>> the freedom to extend support for a particular change.
>>
>> 5. Changes to the default settings are allowed between minor releases,
>> provided that we give the users a way to revert back to the old behavior.
>> Examples are LUCENE-1542 and the latest issues Mike opened. Those changes
>> will be applied out-of-the-box. The provided API to revert to the old
>> behavior may be a supported API, or a deprecated API. For deprecation we can
>> decide to keep the API longer than one minor release.
>>
>> 5.1) An exception to (5) are bug fixes which break back-compat - those are
>> always visible, w/ a way to revert to the buggy behavior. That way may be
>> deprecated or not, and its support lifetime can be made on a case-by-case
>> basis.
>>
>> 6. Minor changes to APIs can happen w/o any deprecation. Example,
>> LUCENE-1614, adding 1/2 methods to an interface with a good documentation
>> and trivial proposal for implementation etc.
>>
>> You will notice that almost every proposal has a "we may decide to keep it
>> for longer" - I wrote it following one of the early responses on this thread
>> (I think it was Grant's) - we should not attempt to set things in stone. Our
>> back-compat policy should ensure some level of SLA to our users, but
>> otherwise we should not act as robots, and if we think a certain case
>> requires a different handling than the policy states (only for the user's
>> benefit though), it should be done that way. The burden is still put on the
>> committers, only now the policy is relaxed a bit, and handles different
>> cases in different ways, and the committers/contributors don't need to feel
>> that their hands are tied.
>>
>> These set the ground/basis, but otherwise we should decide on a
>> case-by-case basis on any extension/relaxation of the policy, for our users'
>> benefits. After quite some time I've been following the discussions on this
>> mailing list, I don't remember ever seeing an issue being driven against our
>> users' benefit. All issues attempt to improve Lucene's performance and our
>> users' experience (end users as well as search application developers). I
>> think it's only fair to ask this "users" community be more forgiving and
>> open to make changes on their side too, making the life of the
>> committers/contributors a bit easier.
>>
>> I also agree that the next step would be taking this to java-user and get
>> a sense of whether our "users" community agree with those changes or not. I
>> hope that the above summary captures what's needed to be sent to this list.
>>
>> Shai
>>
>> On Sat, May 30, 2009 at 2:21 PM, Michael McCandless <
>> lucene@mikemccandless.com <ma...@mikemccandless.com>> wrote:
>>
>>    Actually, I think this is a common, and in fact natural/expected
>>    occurrence in open-source.  When a tricky topic is discussed, and the
>>    opinions are often divergent, frequently the conversation never
>>    "converges" to a consensus and the discussion dies.  Only if
>>    discussion reaches a semblance of consensus do we vote on it.
>>
>>    It's exactly like what happens when a controversial bill tries to go
>>    through the US congress.  It's heavily discussed and then dies off
>>    from lack of consensus, or, it gets far enough to be voted on.
>>
>>    Ie, this is completely normal for open source.
>>
>>    We may not like it, we may consider it inefficient, annoying,
>>    frustrating, whatever, but this is in fact a reality of all healthy
>>    open-source projects.
>>
>>    Consensus building is not easy, and if the number of people trying to
>>    build consensus, by iterating on the proposal, compromising,
>>    suggesting alternatives when others dislike an approach, etc., is
>>    dwarfed by the number of people objecting to the proposal, then
>>    consensus never emerges.
>>
>>    In this case specifically, I had a rather singular goal: the freedom
>>    to make changes to defaults inside Lucene to always favor new users,
>>    while not hurting back-compat users.  I intentionally proposed no
>>    changes to our back-compat policy (knowing reaching consensus would be
>>    that much more difficult).
>>
>>    The proposal went through several iterations (*settings,
>>    *actsAsVersion, etc) that all failed to reach consensus, so we settled
>>    back on the current approach of "make the setting explicit" which is
>>    an OK workaround.  One by one I've been doing that for the original
>>    examples I listed (readOnly IndexReader, NIOFSDir default, etc.)
>>
>>    But, then, the conversation shifted to a different topic ("how to
>>    relax our back-compat policy"), which also failed to reach consensus.
>>
>>    Maybe, the best way forward is to break out each of the separate
>>    bullets and discuss them separately?
>>
>>    Mike
>>
>>    On Fri, May 29, 2009 at 11:22 PM, Shai Erera <serera@gmail.com
>>    <ma...@gmail.com>> wrote:
>>    > So ... I've this happen a lot of times (especially in my thesis
>>    work) -
>>    > someone raises a controversial topic, or one that touches the
>>    nervous of the
>>    > system, there's a flurry of activity and then it dies
>>    unexpectedly, even
>>    > though it feels to everyone that there's "an extra mile" that
>>    should be
>>    > taken in order to bring it to completion.
>>    >
>>    > And that's what I've seen in this thread. A lot has been said -
>>    lots of
>>    > comments, ideas, opinions. Lots of ranting and complaining. Then
>>    it died ...
>>    > Thank you Grant for that last "beep", I hope that was an
>>    intention to
>>    > resurrect it.
>>    >
>>    > So I ask - how come that we don't have a decision? Is it because
>>    we're
>>    > "afraid" to make a decision? (that last sentence is supposed to
>>    "tease" the
>>    > community, not to pass judgement)
>>    >
>>    > I'm asking because it seems like everybody pretty much agrees on
>>    most of the
>>    > suggestions, so why not decide "let's do X, Y and Z" and change the
>>    > back-compat page starting from 2.9? If people don't remember the
>>    decisions,
>>    > I don't mind reiterating them.
>>    >
>>    > (I also ask because I'd like to take the improvements from
>>    LUCENE-1614 to
>>    > TermDocs/Positions, PhrasePositions, Spans. All except
>>    PhrasePositions are
>>    > public interfaces and so it matters if I need to go through creating
>>    > abstract classes, with new names, or I can change those
>>    interfaces, asking
>>    > those that implemented their own TermDocs to modify the code).
>>    >
>>    > Shai
>>    >
>>    > On Wed, May 27, 2009 at 10:36 PM, Grant Ingersoll
>>    <gsingers@apache.org <ma...@apache.org>>
>>    > wrote:
>>    >>
>>    >> So, here's a real, concrete example of the need for case by
>>    case back
>>    >> compat.  See https://issues.apache.org/jira/browse/LUCENE-1662
>>    >>
>>    >> It's completely stupid that ExtendedFieldCache even exists.      It
>> is a dumb
>>    >> workaround for a made up problem that has nothing to do with
>>    real coders
>>    >> living in the modern age of development where IDE's make
>>    refactoring these
>>    >> types of things very cheap.  Namely, the notion that interfaces
>>    must never
>>    >> change lest every 6-9 months some minute number of users (I'd
>>    venture it's
>>    >> less than 1% of users) out there, who by any account are
>>    completely capable
>>    >> of implementing hard core Lucene internals (like extending
>>    FieldCache), yet
>>    >> are seemingly incapable of reading a CHANGES file with a huge
>>    disclaimer in
>>    >> it, have to recompile (GASP!) their code and put in a dummy
>>    implementation
>>    >> of some new interface method.  Yet, here we are with Yonik
>>    fixing very real
>>    >> problems that are a direct result of coding around back compat.
>>    (along with
>>    >> a mistake; it took a long time for this issue to even be
>>    discovered) that
>>    >> very much effect the usability of Lucene and the day to day
>>    experience of a
>>    >> good number of users.
>>    >>
>>    >> In other words, the real fix for L-1662 is for ExtFieldCache to
>>    be folded
>>    >> into FieldCache and for the file to be removed, never to be
>>    heard from
>>    >> again.
>>    >>
>>    >> The same can be said for the whole Fieldable issue, but that's
>>    a different
>>    >> day.
>>    >>
>>    >> Ranting,
>>    >> Grant
>>    >>
>>    >>
>>    ---------------------------------------------------------------------
>>    >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>    <ma...@lucene.apache.org>
>>    >> For additional commands, e-mail:
>>    java-dev-help@lucene.apache.org
>>    <ma...@lucene.apache.org>
>>    >>
>>    >
>>    >
>>
>>    ---------------------------------------------------------------------
>>    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>    <ma...@lucene.apache.org>
>>    For additional commands, e-mail: java-dev-help@lucene.apache.org
>>    <ma...@lucene.apache.org>
>>
>>
>>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: Lucene's default settings & back compatibility

Posted by Mark Miller <ma...@gmail.com>.

No one really responded to this Shai? And I take it that the user list 
never saw it?

Perhaps we should just ask for opinion from the user list based on what 
you already have - just to gauge the reaction on different points. 
Unless someone responds shortly, we could take a year waiting to shake 
it out.
The threat of sending should prompt anyone with any issues to speak up.

I think we should add though:
explicitly what has changed (eg if we switch something, what was the 
policy before - most users won't even know)
an overview of why we are interested in relaxing back compat

- Mark

Shai Erera wrote:
> Ok, so digging back in this thread, I think the following proposals 
> were made (if I missed some, please add them):
>
> 1. API deprecation last *at least* one full minor release. Example: if 
> we deprecate an API in 2.4, we can remove it in 2.5. BUT, we are also 
> free to keep it there and remove it in 2.6, 2.9, 3.0, 3.5. I would 
> like to reserve that option for controversial deprecations, like 
> TokenStream, and maybe even the HitCollector recent changes. Those 
> that we feel will have a large impact on the users, we might want to 
> keep around for a bit longer until we get enough feedback from the 
> field and are more confident with that change.
>
> 2. Bugs are fixed backwards on the last "dot" release only. Example, A 
> bug that's discovered after 2.4 is released, is fixed on 2.4.X branch. 
> Once 2.5 is released, any bug fixes happen on trunk and 2.5.X. A 
> slight relaxation would be adding something like "we may still fix 
> bugs on the 2.4.X branch if we feel it's important enough". For 
> example if 2.5 contains a lot of API changes and we think a 
> considerable portion of our users are still on 2.4.
>
> 3. Jar drop-in ability is only guaranteed on point releases (this is 
> slightly of an outcome of (1) and (2), but (6) will also affect it).
>
> 4. Changes to the index format last at least one full major release. 
> Example: a change to the index format in 2.X, is supported in all 3.Y 
> releases, and removed in 4.0. Again, I write "at least" since we 
> should have the freedom to extend support for a particular change.
>
> 5. Changes to the default settings are allowed between minor releases, 
> provided that we give the users a way to revert back to the old 
> behavior. Examples are LUCENE-1542 and the latest issues Mike opened. 
> Those changes will be applied out-of-the-box. The provided API to 
> revert to the old behavior may be a supported API, or a deprecated 
> API. For deprecation we can decide to keep the API longer than one 
> minor release.
>
> 5.1) An exception to (5) are bug fixes which break back-compat - those 
> are always visible, w/ a way to revert to the buggy behavior. That way 
> may be deprecated or not, and its support lifetime can be made on a 
> case-by-case basis.
>
> 6. Minor changes to APIs can happen w/o any deprecation. Example, 
> LUCENE-1614, adding 1/2 methods to an interface with a good 
> documentation and trivial proposal for implementation etc.
>
> You will notice that almost every proposal has a "we may decide to 
> keep it for longer" - I wrote it following one of the early responses 
> on this thread (I think it was Grant's) - we should not attempt to set 
> things in stone. Our back-compat policy should ensure some level of 
> SLA to our users, but otherwise we should not act as robots, and if we 
> think a certain case requires a different handling than the policy 
> states (only for the user's benefit though), it should be done that 
> way. The burden is still put on the committers, only now the policy is 
> relaxed a bit, and handles different cases in different ways, and the 
> committers/contributors don't need to feel that their hands are tied.
>
> These set the ground/basis, but otherwise we should decide on a 
> case-by-case basis on any extension/relaxation of the policy, for our 
> users' benefits. After quite some time I've been following the 
> discussions on this mailing list, I don't remember ever seeing an 
> issue being driven against our users' benefit. All issues attempt to 
> improve Lucene's performance and our users' experience (end users as 
> well as search application developers). I think it's only fair to ask 
> this "users" community be more forgiving and open to make changes on 
> their side too, making the life of the committers/contributors a bit 
> easier.
>
> I also agree that the next step would be taking this to java-user and 
> get a sense of whether our "users" community agree with those changes 
> or not. I hope that the above summary captures what's needed to be 
> sent to this list.
>
> Shai
>
> On Sat, May 30, 2009 at 2:21 PM, Michael McCandless 
> <lucene@mikemccandless.com <ma...@mikemccandless.com>> wrote:
>
>     Actually, I think this is a common, and in fact natural/expected
>     occurrence in open-source.  When a tricky topic is discussed, and the
>     opinions are often divergent, frequently the conversation never
>     "converges" to a consensus and the discussion dies.  Only if
>     discussion reaches a semblance of consensus do we vote on it.
>
>     It's exactly like what happens when a controversial bill tries to go
>     through the US congress.  It's heavily discussed and then dies off
>     from lack of consensus, or, it gets far enough to be voted on.
>
>     Ie, this is completely normal for open source.
>
>     We may not like it, we may consider it inefficient, annoying,
>     frustrating, whatever, but this is in fact a reality of all healthy
>     open-source projects.
>
>     Consensus building is not easy, and if the number of people trying to
>     build consensus, by iterating on the proposal, compromising,
>     suggesting alternatives when others dislike an approach, etc., is
>     dwarfed by the number of people objecting to the proposal, then
>     consensus never emerges.
>
>     In this case specifically, I had a rather singular goal: the freedom
>     to make changes to defaults inside Lucene to always favor new users,
>     while not hurting back-compat users.  I intentionally proposed no
>     changes to our back-compat policy (knowing reaching consensus would be
>     that much more difficult).
>
>     The proposal went through several iterations (*settings,
>     *actsAsVersion, etc) that all failed to reach consensus, so we settled
>     back on the current approach of "make the setting explicit" which is
>     an OK workaround.  One by one I've been doing that for the original
>     examples I listed (readOnly IndexReader, NIOFSDir default, etc.)
>
>     But, then, the conversation shifted to a different topic ("how to
>     relax our back-compat policy"), which also failed to reach consensus.
>
>     Maybe, the best way forward is to break out each of the separate
>     bullets and discuss them separately?
>
>     Mike
>
>     On Fri, May 29, 2009 at 11:22 PM, Shai Erera <serera@gmail.com
>     <ma...@gmail.com>> wrote:
>     > So ... I've this happen a lot of times (especially in my thesis
>     work) -
>     > someone raises a controversial topic, or one that touches the
>     nervous of the
>     > system, there's a flurry of activity and then it dies
>     unexpectedly, even
>     > though it feels to everyone that there's "an extra mile" that
>     should be
>     > taken in order to bring it to completion.
>     >
>     > And that's what I've seen in this thread. A lot has been said -
>     lots of
>     > comments, ideas, opinions. Lots of ranting and complaining. Then
>     it died ...
>     > Thank you Grant for that last "beep", I hope that was an
>     intention to
>     > resurrect it.
>     >
>     > So I ask - how come that we don't have a decision? Is it because
>     we're
>     > "afraid" to make a decision? (that last sentence is supposed to
>     "tease" the
>     > community, not to pass judgement)
>     >
>     > I'm asking because it seems like everybody pretty much agrees on
>     most of the
>     > suggestions, so why not decide "let's do X, Y and Z" and change the
>     > back-compat page starting from 2.9? If people don't remember the
>     decisions,
>     > I don't mind reiterating them.
>     >
>     > (I also ask because I'd like to take the improvements from
>     LUCENE-1614 to
>     > TermDocs/Positions, PhrasePositions, Spans. All except
>     PhrasePositions are
>     > public interfaces and so it matters if I need to go through creating
>     > abstract classes, with new names, or I can change those
>     interfaces, asking
>     > those that implemented their own TermDocs to modify the code).
>     >
>     > Shai
>     >
>     > On Wed, May 27, 2009 at 10:36 PM, Grant Ingersoll
>     <gsingers@apache.org <ma...@apache.org>>
>     > wrote:
>     >>
>     >> So, here's a real, concrete example of the need for case by
>     case back
>     >> compat.  See https://issues.apache.org/jira/browse/LUCENE-1662
>     >>
>     >> It's completely stupid that ExtendedFieldCache even exists.  
>     It is a dumb
>     >> workaround for a made up problem that has nothing to do with
>     real coders
>     >> living in the modern age of development where IDE's make
>     refactoring these
>     >> types of things very cheap.  Namely, the notion that interfaces
>     must never
>     >> change lest every 6-9 months some minute number of users (I'd
>     venture it's
>     >> less than 1% of users) out there, who by any account are
>     completely capable
>     >> of implementing hard core Lucene internals (like extending
>     FieldCache), yet
>     >> are seemingly incapable of reading a CHANGES file with a huge
>     disclaimer in
>     >> it, have to recompile (GASP!) their code and put in a dummy
>     implementation
>     >> of some new interface method.  Yet, here we are with Yonik
>     fixing very real
>     >> problems that are a direct result of coding around back compat.
>     (along with
>     >> a mistake; it took a long time for this issue to even be
>     discovered) that
>     >> very much effect the usability of Lucene and the day to day
>     experience of a
>     >> good number of users.
>     >>
>     >> In other words, the real fix for L-1662 is for ExtFieldCache to
>     be folded
>     >> into FieldCache and for the file to be removed, never to be
>     heard from
>     >> again.
>     >>
>     >> The same can be said for the whole Fieldable issue, but that's
>     a different
>     >> day.
>     >>
>     >> Ranting,
>     >> Grant
>     >>
>     >>
>     ---------------------------------------------------------------------
>     >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >> For additional commands, e-mail:
>     java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >>
>     >
>     >
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     For additional commands, e-mail: java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>
>


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Shai Erera <se...@gmail.com>.

Ok, so digging back in this thread, I think the following proposals were
made (if I missed some, please add them):

1. API deprecation last *at least* one full minor release. Example: if we
deprecate an API in 2.4, we can remove it in 2.5. BUT, we are also free to
keep it there and remove it in 2.6, 2.9, 3.0, 3.5. I would like to reserve
that option for controversial deprecations, like TokenStream, and maybe even
the HitCollector recent changes. Those that we feel will have a large impact
on the users, we might want to keep around for a bit longer until we get
enough feedback from the field and are more confident with that change.

2. Bugs are fixed backwards on the last "dot" release only. Example, A bug
that's discovered after 2.4 is released, is fixed on 2.4.X branch. Once 2.5
is released, any bug fixes happen on trunk and 2.5.X. A slight relaxation
would be adding something like "we may still fix bugs on the 2.4.X branch if
we feel it's important enough". For example if 2.5 contains a lot of API
changes and we think a considerable portion of our users are still on 2.4.

3. Jar drop-in ability is only guaranteed on point releases (this is
slightly of an outcome of (1) and (2), but (6) will also affect it).

4. Changes to the index format last at least one full major release.
Example: a change to the index format in 2.X, is supported in all 3.Y
releases, and removed in 4.0. Again, I write "at least" since we should have
the freedom to extend support for a particular change.

5. Changes to the default settings are allowed between minor releases,
provided that we give the users a way to revert back to the old behavior.
Examples are LUCENE-1542 and the latest issues Mike opened. Those changes
will be applied out-of-the-box. The provided API to revert to the old
behavior may be a supported API, or a deprecated API. For deprecation we can
decide to keep the API longer than one minor release.

5.1) An exception to (5) are bug fixes which break back-compat - those are
always visible, w/ a way to revert to the buggy behavior. That way may be
deprecated or not, and its support lifetime can be made on a case-by-case
basis.

6. Minor changes to APIs can happen w/o any deprecation. Example,
LUCENE-1614, adding 1/2 methods to an interface with a good documentation
and trivial proposal for implementation etc.

You will notice that almost every proposal has a "we may decide to keep it
for longer" - I wrote it following one of the early responses on this thread
(I think it was Grant's) - we should not attempt to set things in stone. Our
back-compat policy should ensure some level of SLA to our users, but
otherwise we should not act as robots, and if we think a certain case
requires a different handling than the policy states (only for the user's
benefit though), it should be done that way. The burden is still put on the
committers, only now the policy is relaxed a bit, and handles different
cases in different ways, and the committers/contributors don't need to feel
that their hands are tied.

These set the ground/basis, but otherwise we should decide on a case-by-case
basis on any extension/relaxation of the policy, for our users' benefits.
After quite some time I've been following the discussions on this mailing
list, I don't remember ever seeing an issue being driven against our users'
benefit. All issues attempt to improve Lucene's performance and our users'
experience (end users as well as search application developers). I think
it's only fair to ask this "users" community be more forgiving and open to
make changes on their side too, making the life of the
committers/contributors a bit easier.

I also agree that the next step would be taking this to java-user and get a
sense of whether our "users" community agree with those changes or not. I
hope that the above summary captures what's needed to be sent to this list.

Shai

On Sat, May 30, 2009 at 2:21 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Actually, I think this is a common, and in fact natural/expected
> occurrence in open-source.  When a tricky topic is discussed, and the
> opinions are often divergent, frequently the conversation never
> "converges" to a consensus and the discussion dies.  Only if
> discussion reaches a semblance of consensus do we vote on it.
>
> It's exactly like what happens when a controversial bill tries to go
> through the US congress.  It's heavily discussed and then dies off
> from lack of consensus, or, it gets far enough to be voted on.
>
> Ie, this is completely normal for open source.
>
> We may not like it, we may consider it inefficient, annoying,
> frustrating, whatever, but this is in fact a reality of all healthy
> open-source projects.
>
> Consensus building is not easy, and if the number of people trying to
> build consensus, by iterating on the proposal, compromising,
> suggesting alternatives when others dislike an approach, etc., is
> dwarfed by the number of people objecting to the proposal, then
> consensus never emerges.
>
> In this case specifically, I had a rather singular goal: the freedom
> to make changes to defaults inside Lucene to always favor new users,
> while not hurting back-compat users.  I intentionally proposed no
> changes to our back-compat policy (knowing reaching consensus would be
> that much more difficult).
>
> The proposal went through several iterations (*settings,
> *actsAsVersion, etc) that all failed to reach consensus, so we settled
> back on the current approach of "make the setting explicit" which is
> an OK workaround.  One by one I've been doing that for the original
> examples I listed (readOnly IndexReader, NIOFSDir default, etc.)
>
> But, then, the conversation shifted to a different topic ("how to
> relax our back-compat policy"), which also failed to reach consensus.
>
> Maybe, the best way forward is to break out each of the separate
> bullets and discuss them separately?
>
> Mike
>
> On Fri, May 29, 2009 at 11:22 PM, Shai Erera <se...@gmail.com> wrote:
> > So ... I've this happen a lot of times (especially in my thesis work) -
> > someone raises a controversial topic, or one that touches the nervous of
> the
> > system, there's a flurry of activity and then it dies unexpectedly, even
> > though it feels to everyone that there's "an extra mile" that should be
> > taken in order to bring it to completion.
> >
> > And that's what I've seen in this thread. A lot has been said - lots of
> > comments, ideas, opinions. Lots of ranting and complaining. Then it died
> ...
> > Thank you Grant for that last "beep", I hope that was an intention to
> > resurrect it.
> >
> > So I ask - how come that we don't have a decision? Is it because we're
> > "afraid" to make a decision? (that last sentence is supposed to "tease"
> the
> > community, not to pass judgement)
> >
> > I'm asking because it seems like everybody pretty much agrees on most of
> the
> > suggestions, so why not decide "let's do X, Y and Z" and change the
> > back-compat page starting from 2.9? If people don't remember the
> decisions,
> > I don't mind reiterating them.
> >
> > (I also ask because I'd like to take the improvements from LUCENE-1614 to
> > TermDocs/Positions, PhrasePositions, Spans. All except PhrasePositions
> are
> > public interfaces and so it matters if I need to go through creating
> > abstract classes, with new names, or I can change those interfaces,
> asking
> > those that implemented their own TermDocs to modify the code).
> >
> > Shai
> >
> > On Wed, May 27, 2009 at 10:36 PM, Grant Ingersoll <gs...@apache.org>
> > wrote:
> >>
> >> So, here's a real, concrete example of the need for case by case back
> >> compat.  See https://issues.apache.org/jira/browse/LUCENE-1662
> >>
> >> It's completely stupid that ExtendedFieldCache even exists.   It is a
> dumb
> >> workaround for a made up problem that has nothing to do with real coders
> >> living in the modern age of development where IDE's make refactoring
> these
> >> types of things very cheap.  Namely, the notion that interfaces must
> never
> >> change lest every 6-9 months some minute number of users (I'd venture
> it's
> >> less than 1% of users) out there, who by any account are completely
> capable
> >> of implementing hard core Lucene internals (like extending FieldCache),
> yet
> >> are seemingly incapable of reading a CHANGES file with a huge disclaimer
> in
> >> it, have to recompile (GASP!) their code and put in a dummy
> implementation
> >> of some new interface method.  Yet, here we are with Yonik fixing very
> real
> >> problems that are a direct result of coding around back compat. (along
> with
> >> a mistake; it took a long time for this issue to even be discovered)
> that
> >> very much effect the usability of Lucene and the day to day experience
> of a
> >> good number of users.
> >>
> >> In other words, the real fix for L-1662 is for ExtFieldCache to be
> folded
> >> into FieldCache and for the file to be removed, never to be heard from
> >> again.
> >>
> >> The same can be said for the whole Fieldable issue, but that's a
> different
> >> day.
> >>
> >> Ranting,
> >> Grant
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: Lucene's default settings & back compatibility

Posted by DM Smith <dm...@gmail.com>.

I think one conclusion that did come of this discussion was that bugs  
should be fixed even if it breaks backward compatibility.

-- DM

On May 30, 2009, at 7:21 AM, Michael McCandless wrote:

> Actually, I think this is a common, and in fact natural/expected
> occurrence in open-source.  When a tricky topic is discussed, and the
> opinions are often divergent, frequently the conversation never
> "converges" to a consensus and the discussion dies.  Only if
> discussion reaches a semblance of consensus do we vote on it.
>
> It's exactly like what happens when a controversial bill tries to go
> through the US congress.  It's heavily discussed and then dies off
> from lack of consensus, or, it gets far enough to be voted on.
>
> Ie, this is completely normal for open source.
>
> We may not like it, we may consider it inefficient, annoying,
> frustrating, whatever, but this is in fact a reality of all healthy
> open-source projects.
>
> Consensus building is not easy, and if the number of people trying to
> build consensus, by iterating on the proposal, compromising,
> suggesting alternatives when others dislike an approach, etc., is
> dwarfed by the number of people objecting to the proposal, then
> consensus never emerges.
>
> In this case specifically, I had a rather singular goal: the freedom
> to make changes to defaults inside Lucene to always favor new users,
> while not hurting back-compat users.  I intentionally proposed no
> changes to our back-compat policy (knowing reaching consensus would be
> that much more difficult).
>
> The proposal went through several iterations (*settings,
> *actsAsVersion, etc) that all failed to reach consensus, so we settled
> back on the current approach of "make the setting explicit" which is
> an OK workaround.  One by one I've been doing that for the original
> examples I listed (readOnly IndexReader, NIOFSDir default, etc.)
>
> But, then, the conversation shifted to a different topic ("how to
> relax our back-compat policy"), which also failed to reach consensus.
>
> Maybe, the best way forward is to break out each of the separate
> bullets and discuss them separately?
>
> Mike
>
> On Fri, May 29, 2009 at 11:22 PM, Shai Erera <se...@gmail.com> wrote:
>> So ... I've this happen a lot of times (especially in my thesis  
>> work) -
>> someone raises a controversial topic, or one that touches the  
>> nervous of the
>> system, there's a flurry of activity and then it dies unexpectedly,  
>> even
>> though it feels to everyone that there's "an extra mile" that  
>> should be
>> taken in order to bring it to completion.
>>
>> And that's what I've seen in this thread. A lot has been said -  
>> lots of
>> comments, ideas, opinions. Lots of ranting and complaining. Then it  
>> died ...
>> Thank you Grant for that last "beep", I hope that was an intention to
>> resurrect it.
>>
>> So I ask - how come that we don't have a decision? Is it because  
>> we're
>> "afraid" to make a decision? (that last sentence is supposed to  
>> "tease" the
>> community, not to pass judgement)
>>
>> I'm asking because it seems like everybody pretty much agrees on  
>> most of the
>> suggestions, so why not decide "let's do X, Y and Z" and change the
>> back-compat page starting from 2.9? If people don't remember the  
>> decisions,
>> I don't mind reiterating them.
>>
>> (I also ask because I'd like to take the improvements from  
>> LUCENE-1614 to
>> TermDocs/Positions, PhrasePositions, Spans. All except  
>> PhrasePositions are
>> public interfaces and so it matters if I need to go through creating
>> abstract classes, with new names, or I can change those interfaces,  
>> asking
>> those that implemented their own TermDocs to modify the code).
>>
>> Shai
>>
>> On Wed, May 27, 2009 at 10:36 PM, Grant Ingersoll <gsingers@apache.org 
>> >
>> wrote:
>>>
>>> So, here's a real, concrete example of the need for case by case  
>>> back
>>> compat.  See https://issues.apache.org/jira/browse/LUCENE-1662
>>>
>>> It's completely stupid that ExtendedFieldCache even exists.   It  
>>> is a dumb
>>> workaround for a made up problem that has nothing to do with real  
>>> coders
>>> living in the modern age of development where IDE's make  
>>> refactoring these
>>> types of things very cheap.  Namely, the notion that interfaces  
>>> must never
>>> change lest every 6-9 months some minute number of users (I'd  
>>> venture it's
>>> less than 1% of users) out there, who by any account are  
>>> completely capable
>>> of implementing hard core Lucene internals (like extending  
>>> FieldCache), yet
>>> are seemingly incapable of reading a CHANGES file with a huge  
>>> disclaimer in
>>> it, have to recompile (GASP!) their code and put in a dummy  
>>> implementation
>>> of some new interface method.  Yet, here we are with Yonik fixing  
>>> very real
>>> problems that are a direct result of coding around back compat.  
>>> (along with
>>> a mistake; it took a long time for this issue to even be  
>>> discovered) that
>>> very much effect the usability of Lucene and the day to day  
>>> experience of a
>>> good number of users.
>>>
>>> In other words, the real fix for L-1662 is for ExtFieldCache to be  
>>> folded
>>> into FieldCache and for the file to be removed, never to be heard  
>>> from
>>> again.
>>>
>>> The same can be said for the whole Fieldable issue, but that's a  
>>> different
>>> day.
>>>
>>> Ranting,
>>> Grant
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

Actually, I think this is a common, and in fact natural/expected
occurrence in open-source.  When a tricky topic is discussed, and the
opinions are often divergent, frequently the conversation never
"converges" to a consensus and the discussion dies.  Only if
discussion reaches a semblance of consensus do we vote on it.

It's exactly like what happens when a controversial bill tries to go
through the US congress.  It's heavily discussed and then dies off
from lack of consensus, or, it gets far enough to be voted on.

Ie, this is completely normal for open source.

We may not like it, we may consider it inefficient, annoying,
frustrating, whatever, but this is in fact a reality of all healthy
open-source projects.

Consensus building is not easy, and if the number of people trying to
build consensus, by iterating on the proposal, compromising,
suggesting alternatives when others dislike an approach, etc., is
dwarfed by the number of people objecting to the proposal, then
consensus never emerges.

In this case specifically, I had a rather singular goal: the freedom
to make changes to defaults inside Lucene to always favor new users,
while not hurting back-compat users.  I intentionally proposed no
changes to our back-compat policy (knowing reaching consensus would be
that much more difficult).

The proposal went through several iterations (*settings,
*actsAsVersion, etc) that all failed to reach consensus, so we settled
back on the current approach of "make the setting explicit" which is
an OK workaround.  One by one I've been doing that for the original
examples I listed (readOnly IndexReader, NIOFSDir default, etc.)

But, then, the conversation shifted to a different topic ("how to
relax our back-compat policy"), which also failed to reach consensus.

Maybe, the best way forward is to break out each of the separate
bullets and discuss them separately?

Mike

On Fri, May 29, 2009 at 11:22 PM, Shai Erera <se...@gmail.com> wrote:
> So ... I've this happen a lot of times (especially in my thesis work) -
> someone raises a controversial topic, or one that touches the nervous of the
> system, there's a flurry of activity and then it dies unexpectedly, even
> though it feels to everyone that there's "an extra mile" that should be
> taken in order to bring it to completion.
>
> And that's what I've seen in this thread. A lot has been said - lots of
> comments, ideas, opinions. Lots of ranting and complaining. Then it died ...
> Thank you Grant for that last "beep", I hope that was an intention to
> resurrect it.
>
> So I ask - how come that we don't have a decision? Is it because we're
> "afraid" to make a decision? (that last sentence is supposed to "tease" the
> community, not to pass judgement)
>
> I'm asking because it seems like everybody pretty much agrees on most of the
> suggestions, so why not decide "let's do X, Y and Z" and change the
> back-compat page starting from 2.9? If people don't remember the decisions,
> I don't mind reiterating them.
>
> (I also ask because I'd like to take the improvements from LUCENE-1614 to
> TermDocs/Positions, PhrasePositions, Spans. All except PhrasePositions are
> public interfaces and so it matters if I need to go through creating
> abstract classes, with new names, or I can change those interfaces, asking
> those that implemented their own TermDocs to modify the code).
>
> Shai
>
> On Wed, May 27, 2009 at 10:36 PM, Grant Ingersoll <gs...@apache.org>
> wrote:
>>
>> So, here's a real, concrete example of the need for case by case back
>> compat.  See https://issues.apache.org/jira/browse/LUCENE-1662
>>
>> It's completely stupid that ExtendedFieldCache even exists.   It is a dumb
>> workaround for a made up problem that has nothing to do with real coders
>> living in the modern age of development where IDE's make refactoring these
>> types of things very cheap.  Namely, the notion that interfaces must never
>> change lest every 6-9 months some minute number of users (I'd venture it's
>> less than 1% of users) out there, who by any account are completely capable
>> of implementing hard core Lucene internals (like extending FieldCache), yet
>> are seemingly incapable of reading a CHANGES file with a huge disclaimer in
>> it, have to recompile (GASP!) their code and put in a dummy implementation
>> of some new interface method.  Yet, here we are with Yonik fixing very real
>> problems that are a direct result of coding around back compat. (along with
>> a mistake; it took a long time for this issue to even be discovered) that
>> very much effect the usability of Lucene and the day to day experience of a
>> good number of users.
>>
>> In other words, the real fix for L-1662 is for ExtFieldCache to be folded
>> into FieldCache and for the file to be removed, never to be heard from
>> again.
>>
>> The same can be said for the whole Fieldable issue, but that's a different
>> day.
>>
>> Ranting,
>> Grant
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Earwin Burrfoot <ea...@gmail.com>.

As far as I understand the policy-making process, someone from PMC has
to start the vote, and then PMC members should, well, vote.
Without them taking action we can "beep" to our hearts' content
without any consequences.

On Sat, May 30, 2009 at 07:22, Shai Erera <se...@gmail.com> wrote:
> So ... I've this happen a lot of times (especially in my thesis work) -
> someone raises a controversial topic, or one that touches the nervous of the
> system, there's a flurry of activity and then it dies unexpectedly, even
> though it feels to everyone that there's "an extra mile" that should be
> taken in order to bring it to completion.
>
> And that's what I've seen in this thread. A lot has been said - lots of
> comments, ideas, opinions. Lots of ranting and complaining. Then it died ...
> Thank you Grant for that last "beep", I hope that was an intention to
> resurrect it.
>
> So I ask - how come that we don't have a decision? Is it because we're
> "afraid" to make a decision? (that last sentence is supposed to "tease" the
> community, not to pass judgement)
>
> I'm asking because it seems like everybody pretty much agrees on most of the
> suggestions, so why not decide "let's do X, Y and Z" and change the
> back-compat page starting from 2.9? If people don't remember the decisions,
> I don't mind reiterating them.
>
> (I also ask because I'd like to take the improvements from LUCENE-1614 to
> TermDocs/Positions, PhrasePositions, Spans. All except PhrasePositions are
> public interfaces and so it matters if I need to go through creating
> abstract classes, with new names, or I can change those interfaces, asking
> those that implemented their own TermDocs to modify the code).
>
> Shai
>
> On Wed, May 27, 2009 at 10:36 PM, Grant Ingersoll <gs...@apache.org>
> wrote:
>>
>> So, here's a real, concrete example of the need for case by case back
>> compat.  See https://issues.apache.org/jira/browse/LUCENE-1662
>>
>> It's completely stupid that ExtendedFieldCache even exists.   It is a dumb
>> workaround for a made up problem that has nothing to do with real coders
>> living in the modern age of development where IDE's make refactoring these
>> types of things very cheap.  Namely, the notion that interfaces must never
>> change lest every 6-9 months some minute number of users (I'd venture it's
>> less than 1% of users) out there, who by any account are completely capable
>> of implementing hard core Lucene internals (like extending FieldCache), yet
>> are seemingly incapable of reading a CHANGES file with a huge disclaimer in
>> it, have to recompile (GASP!) their code and put in a dummy implementation
>> of some new interface method.  Yet, here we are with Yonik fixing very real
>> problems that are a direct result of coding around back compat. (along with
>> a mistake; it took a long time for this issue to even be discovered) that
>> very much effect the usability of Lucene and the day to day experience of a
>> good number of users.
>>
>> In other words, the real fix for L-1662 is for ExtFieldCache to be folded
>> into FieldCache and for the file to be removed, never to be heard from
>> again.
>>
>> The same can be said for the whole Fieldable issue, but that's a different
>> day.
>>
>> Ranting,
>> Grant
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Grant Ingersoll <gs...@apache.org>.

I think the last piece that is needed is to ask on java-user what  
others think.  In order to do that, I think it needs to be boiled down  
to a couple paragraphs.

-Grant

On May 29, 2009, at 11:22 PM, Shai Erera wrote:

> So ... I've this happen a lot of times (especially in my thesis  
> work) - someone raises a controversial topic, or one that touches  
> the nervous of the system, there's a flurry of activity and then it  
> dies unexpectedly, even though it feels to everyone that there's "an  
> extra mile" that should be taken in order to bring it to completion.
>
> And that's what I've seen in this thread. A lot has been said - lots  
> of comments, ideas, opinions. Lots of ranting and complaining. Then  
> it died ... Thank you Grant for that last "beep", I hope that was an  
> intention to resurrect it.
>
> So I ask - how come that we don't have a decision? Is it because  
> we're "afraid" to make a decision? (that last sentence is supposed  
> to "tease" the community, not to pass judgement)
>
> I'm asking because it seems like everybody pretty much agrees on  
> most of the suggestions, so why not decide "let's do X, Y and Z" and  
> change the back-compat page starting from 2.9? If people don't  
> remember the decisions, I don't mind reiterating them.
>
> (I also ask because I'd like to take the improvements from  
> LUCENE-1614 to TermDocs/Positions, PhrasePositions, Spans. All  
> except PhrasePositions are public interfaces and so it matters if I  
> need to go through creating abstract classes, with new names, or I  
> can change those interfaces, asking those that implemented their own  
> TermDocs to modify the code).
>
> Shai
>
> On Wed, May 27, 2009 at 10:36 PM, Grant Ingersoll  
> <gs...@apache.org> wrote:
> So, here's a real, concrete example of the need for case by case  
> back compat.  See https://issues.apache.org/jira/browse/LUCENE-1662
>
> It's completely stupid that ExtendedFieldCache even exists.   It is  
> a dumb workaround for a made up problem that has nothing to do with  
> real coders living in the modern age of development where IDE's make  
> refactoring these types of things very cheap.  Namely, the notion  
> that interfaces must never change lest every 6-9 months some minute  
> number of users (I'd venture it's less than 1% of users) out there,  
> who by any account are completely capable of implementing hard core  
> Lucene internals (like extending FieldCache), yet are seemingly  
> incapable of reading a CHANGES file with a huge disclaimer in it,  
> have to recompile (GASP!) their code and put in a dummy  
> implementation of some new interface method.  Yet, here we are with  
> Yonik fixing very real problems that are a direct result of coding  
> around back compat. (along with a mistake; it took a long time for  
> this issue to even be discovered) that very much effect the  
> usability of Lucene and the day to day experience of a good number  
> of users.
>
> In other words, the real fix for L-1662 is for ExtFieldCache to be  
> folded into FieldCache and for the file to be removed, never to be  
> heard from again.
>
> The same can be said for the whole Fieldable issue, but that's a  
> different day.
>
> Ranting,
> Grant
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search

Re: Lucene's default settings & back compatibility

Posted by Shai Erera <se...@gmail.com>.

So ... I've this happen a lot of times (especially in my thesis work) -
someone raises a controversial topic, or one that touches the nervous of the
system, there's a flurry of activity and then it dies unexpectedly, even
though it feels to everyone that there's "an extra mile" that should be
taken in order to bring it to completion.

And that's what I've seen in this thread. A lot has been said - lots of
comments, ideas, opinions. Lots of ranting and complaining. Then it died ...
Thank you Grant for that last "beep", I hope that was an intention to
resurrect it.

So I ask - how come that we don't have a decision? Is it because we're
"afraid" to make a decision? (that last sentence is supposed to "tease" the
community, not to pass judgement)

I'm asking because it seems like everybody pretty much agrees on most of the
suggestions, so why not decide "let's do X, Y and Z" and change the
back-compat page starting from 2.9? If people don't remember the decisions,
I don't mind reiterating them.

(I also ask because I'd like to take the improvements from LUCENE-1614 to
TermDocs/Positions, PhrasePositions, Spans. All except PhrasePositions are
public interfaces and so it matters if I need to go through creating
abstract classes, with new names, or I can change those interfaces, asking
those that implemented their own TermDocs to modify the code).

Shai

On Wed, May 27, 2009 at 10:36 PM, Grant Ingersoll <gs...@apache.org>wrote:

> So, here's a real, concrete example of the need for case by case back
> compat.  See https://issues.apache.org/jira/browse/LUCENE-1662
>
> It's completely stupid that ExtendedFieldCache even exists.   It is a dumb
> workaround for a made up problem that has nothing to do with real coders
> living in the modern age of development where IDE's make refactoring these
> types of things very cheap.  Namely, the notion that interfaces must never
> change lest every 6-9 months some minute number of users (I'd venture it's
> less than 1% of users) out there, who by any account are completely capable
> of implementing hard core Lucene internals (like extending FieldCache), yet
> are seemingly incapable of reading a CHANGES file with a huge disclaimer in
> it, have to recompile (GASP!) their code and put in a dummy implementation
> of some new interface method.  Yet, here we are with Yonik fixing very real
> problems that are a direct result of coding around back compat. (along with
> a mistake; it took a long time for this issue to even be discovered) that
> very much effect the usability of Lucene and the day to day experience of a
> good number of users.
>
> In other words, the real fix for L-1662 is for ExtFieldCache to be folded
> into FieldCache and for the file to be removed, never to be heard from
> again.
>
> The same can be said for the whole Fieldable issue, but that's a different
> day.
>
> Ranting,
> Grant
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: Lucene's default settings & back compatibility

Posted by Grant Ingersoll <gs...@apache.org>.

So, here's a real, concrete example of the need for case by case back  
compat.  See https://issues.apache.org/jira/browse/LUCENE-1662

It's completely stupid that ExtendedFieldCache even exists.   It is a  
dumb workaround for a made up problem that has nothing to do with real  
coders living in the modern age of development where IDE's make  
refactoring these types of things very cheap.  Namely, the notion that  
interfaces must never change lest every 6-9 months some minute number  
of users (I'd venture it's less than 1% of users) out there, who by  
any account are completely capable of implementing hard core Lucene  
internals (like extending FieldCache), yet are seemingly incapable of  
reading a CHANGES file with a huge disclaimer in it, have to recompile  
(GASP!) their code and put in a dummy implementation of some new  
interface method.  Yet, here we are with Yonik fixing very real  
problems that are a direct result of coding around back compat. (along  
with a mistake; it took a long time for this issue to even be  
discovered) that very much effect the usability of Lucene and the day  
to day experience of a good number of users.

In other words, the real fix for L-1662 is for ExtFieldCache to be  
folded into FieldCache and for the file to be removed, never to be  
heard from again.

The same can be said for the whole Fieldable issue, but that's a  
different day.

Ranting,
Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

OK thanks Shai!

Mike

On Mon, May 25, 2009 at 12:18 AM, Shai Erera <se...@gmail.com> wrote:
> Yes - 1630.
>
> I'll check 1601 and if nothing's left to do I'll cancel/close it
>
> On Sun, May 24, 2009 at 11:25 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>>
>> Actually, under LUCENE-1601, what more was there to do besides turning
>> off scoring when sorting by field, by default?
>>
>> Is there an issue for adding & mating Scorer.scoresDocsInOrder &
>> Collector.acceptsDocsOutOfOrder?
>>
>> Mike
>>
>> On Sun, May 24, 2009 at 3:22 PM, Shai Erera <se...@gmail.com> wrote:
>> > I created LUCENE-1601 for that purpose with a fix-version 3.0. I noticed
>> > you
>> > already opened another issue for the scoring only. So we should remove
>> > it
>> > from there (note that there's a TODO in the code, if you plan to change
>> > it
>> > in the new issue you opened). 1601 will still handle the new method
>> > added to
>> > Searcher (and I think there were some other TODOs in the code too).
>> >
>> > On Sun, May 24, 2009 at 3:48 PM, Michael McCandless
>> > <lu...@mikemccandless.com> wrote:
>> >>
>> >> I'll open an issue for this and we can discuss under there.
>> >>
>> >> And I still need to open issues for the other "change defaluts" in my
>> >> original email.
>> >>
>> >> Mike
>> >>
>> >> On Sun, May 24, 2009 at 8:11 AM, Shai Erera <se...@gmail.com> wrote:
>> >> >> I'm tempted to simply make that change by default for 2.9, now.
>> >> >
>> >> > Agree !
>> >> >
>> >> > Shai
>> >> >
>> >> > On Sun, May 24, 2009 at 1:28 PM, Michael McCandless
>> >> > <lu...@mikemccandless.com> wrote:
>> >> >>
>> >> >> On Sun, May 24, 2009 at 2:20 AM, Shai Erera <se...@gmail.com>
>> >> >> wrote:
>> >> >> > One thing I don't fully understand about actsAsVersion (and I know
>> >> >> > it
>> >> >> > was
>> >> >> > said that we may want to drop that approach) - for how long does
>> >> >> > it
>> >> >> > stay? I
>> >> >> > mean, let's take the invalidAcronym. It is a change in
>> >> >> > back-compat,
>> >> >> > yes.
>> >> >> > But
>> >> >> > for how long are we expected to support it? And if we decide to
>> >> >> > support
>> >> >> > it
>> >> >> > for one minor release, or even one major release, will that ctor
>> >> >> > be
>> >> >> > deprecated? (I think it must be deprecated ...)
>> >> >>
>> >> >> Well, it's pretty clear actsAsVersion in any form (global static,
>> >> >> magically stored in index, passed in to ctors of those classes that
>> >> >> wanted to change settings) has too many objections, so this question
>> >> >> is somewhat moot.  (Yet, if I had to guess, I think we'd want to
>> >> >> support it for longer than 1 minor release, especially for settings
>> >> >> that impact how your index is created).
>> >> >>
>> >> >> Forcing a decision on upgrading, by deprecating the old API, is I
>> >> >> think an adequate workaround in most cases.  New users would not use
>> >> >> the deprecated API, and the javadocs would strongly call out which
>> >> >> default is preferred.  Old users would see nothing change, except
>> >> >> new
>> >> >> deprecations, and when they go to fix the deprecations they'd see
>> >> >> what
>> >> >> setting to use to remain back-compat, and also realize what they are
>> >> >> foregoing by doing so.
>> >> >>
>> >> >> > Also, Mike - you suggested coming up with newer names to methods
>> >> >> > to
>> >> >> > reflect
>> >> >> > new features (such as a boolean saying whether to score when you
>> >> >> > sort).
>> >> >> > This
>> >> >> > is strongly related to our ability to add methdods to
>> >> >> > interfaces/abstract
>> >> >> > classes. If we add an abstract method to Searcher with the new
>> >> >> > boolean,
>> >> >> > we're breaking back-compat.
>> >> >>
>> >> >> Right, though I think on a case by case basis we are in fact willing
>> >> >> to break this, because the back-compat policy is not set in stone.
>> >> >>  EG
>> >> >> we've added new abstract methods to Searcher.
>> >> >>
>> >> >> Still, I think for 2.9 we have to migrate away from all interfaces.
>> >> >> EG we need a dedicated issue w/ migration patch, to move away from
>> >> >> Searchable.
>> >> >>
>> >> >> > Those specific problems (scoring when sorting) came into play only
>> >> >> > since
>> >> >> > the
>> >> >> > introducation of the "fast and easy" search methods (which if you
>> >> >> > look
>> >> >> > at
>> >> >> > their signature - they are not so fast and easy anymore). If we
>> >> >> > had
>> >> >> > just
>> >> >> > search(Collector, Query) (and maybe a couple other variants which
>> >> >> > need
>> >> >> > to
>> >> >> > take into account more than just Collector and Query) you won't
>> >> >> > have
>> >> >> > that
>> >> >> > problem.
>> >> >>
>> >> >> But I think providing the sugar methods (that create TSDC or TFDC)
>> >> >> is
>> >> >> important:
>> >> >>
>> >> >>  search(Query query, int topN)
>> >> >>
>> >> >>  search(Query query, int topN, Sort sort)
>> >> >>
>> >> >> Simple things should be simple; complex things should be possible.
>> >> >>
>> >> >> It's just that that 2nd method should by default do no scoring.
>> >> >>  Maybe
>> >> >> we could simply consider changing that default (w/o adding the new
>> >> >> API) for 2.9?
>> >> >>
>> >> >> > A reviewer, or anyone else will be required to first create a
>> >> >> > Collector.
>> >> >> > They read somewhere that TFC is used for sorting and that it has a
>> >> >> > bunch
>> >> >> > of
>> >> >> > static create() methods. If they don't read it, they at least see
>> >> >> > a
>> >> >> > sample
>> >> >> > somewhere. So they create a TFC and maybe they see a couple of
>> >> >> > completions
>> >> >> > to create or not, but at least the changes are local to TFC. We
>> >> >> > can
>> >> >> > add
>> >> >> > more
>> >> >> > create() variants to TFC w/o breaking back-compat, because TFC is
>> >> >> > not
>> >> >> > extandable.
>> >> >>
>> >> >> Sure, if we had no sugar methods then any wanting to do a search
>> >> >> would
>> >> >> be forced to be fully explicit.  But the power of good defaults is
>> >> >> you're not forced to go and make a bunch of decisions on settings
>> >> >> that
>> >> >> have natural defaults.
>> >> >>
>> >> >> > Coosing the defaults of each create() is bound to whether we want
>> >> >> > the
>> >> >> > defaults to always reflect the best usage (which I prefer). At
>> >> >> > least
>> >> >> > in
>> >> >> > the
>> >> >> > scoring example, I was under the impression we keep scoring for
>> >> >> > the
>> >> >> > sake
>> >> >> > of
>> >> >> > back-compat, even if by changing it, it means nothing too bad will
>> >> >> > happen
>> >> >> > (we all kind of agree that scoring when sorting is useless, but
>> >> >> > because
>> >> >> > of
>> >> >> > our back-compat policy we can't change it). I think there Grant's
>> >> >> > proposal
>> >> >> > to decide on a case-by-case basis would have eliminated scoring
>> >> >> > when
>> >> >> > sorting
>> >> >> > by default.
>> >> >>
>> >> >> Right, when sorting by field we should not score, by default.  I'm
>> >> >> tempted to simply make that change by default for 2.9, now.  When
>> >> >> compared to, say, changing IndexReader.open to return a readOnly
>> >> >> reader by default, which I think would mess up alot of apps, I think
>> >> >> not scoring by default when sorting by field will have much less of
>> >> >> an
>> >> >> impact.
>> >> >>
>> >> >> Mike
>> >> >>
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >> >>
>> >> >
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >>
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Shai Erera <se...@gmail.com>.

Yes - 1630.

I'll check 1601 and if nothing's left to do I'll cancel/close it

On Sun, May 24, 2009 at 11:25 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Actually, under LUCENE-1601, what more was there to do besides turning
> off scoring when sorting by field, by default?
>
> Is there an issue for adding & mating Scorer.scoresDocsInOrder &
> Collector.acceptsDocsOutOfOrder?
>
> Mike
>
> On Sun, May 24, 2009 at 3:22 PM, Shai Erera <se...@gmail.com> wrote:
> > I created LUCENE-1601 for that purpose with a fix-version 3.0. I noticed
> you
> > already opened another issue for the scoring only. So we should remove it
> > from there (note that there's a TODO in the code, if you plan to change
> it
> > in the new issue you opened). 1601 will still handle the new method added
> to
> > Searcher (and I think there were some other TODOs in the code too).
> >
> > On Sun, May 24, 2009 at 3:48 PM, Michael McCandless
> > <lu...@mikemccandless.com> wrote:
> >>
> >> I'll open an issue for this and we can discuss under there.
> >>
> >> And I still need to open issues for the other "change defaluts" in my
> >> original email.
> >>
> >> Mike
> >>
> >> On Sun, May 24, 2009 at 8:11 AM, Shai Erera <se...@gmail.com> wrote:
> >> >> I'm tempted to simply make that change by default for 2.9, now.
> >> >
> >> > Agree !
> >> >
> >> > Shai
> >> >
> >> > On Sun, May 24, 2009 at 1:28 PM, Michael McCandless
> >> > <lu...@mikemccandless.com> wrote:
> >> >>
> >> >> On Sun, May 24, 2009 at 2:20 AM, Shai Erera <se...@gmail.com>
> wrote:
> >> >> > One thing I don't fully understand about actsAsVersion (and I know
> it
> >> >> > was
> >> >> > said that we may want to drop that approach) - for how long does it
> >> >> > stay? I
> >> >> > mean, let's take the invalidAcronym. It is a change in back-compat,
> >> >> > yes.
> >> >> > But
> >> >> > for how long are we expected to support it? And if we decide to
> >> >> > support
> >> >> > it
> >> >> > for one minor release, or even one major release, will that ctor be
> >> >> > deprecated? (I think it must be deprecated ...)
> >> >>
> >> >> Well, it's pretty clear actsAsVersion in any form (global static,
> >> >> magically stored in index, passed in to ctors of those classes that
> >> >> wanted to change settings) has too many objections, so this question
> >> >> is somewhat moot.  (Yet, if I had to guess, I think we'd want to
> >> >> support it for longer than 1 minor release, especially for settings
> >> >> that impact how your index is created).
> >> >>
> >> >> Forcing a decision on upgrading, by deprecating the old API, is I
> >> >> think an adequate workaround in most cases.  New users would not use
> >> >> the deprecated API, and the javadocs would strongly call out which
> >> >> default is preferred.  Old users would see nothing change, except new
> >> >> deprecations, and when they go to fix the deprecations they'd see
> what
> >> >> setting to use to remain back-compat, and also realize what they are
> >> >> foregoing by doing so.
> >> >>
> >> >> > Also, Mike - you suggested coming up with newer names to methods to
> >> >> > reflect
> >> >> > new features (such as a boolean saying whether to score when you
> >> >> > sort).
> >> >> > This
> >> >> > is strongly related to our ability to add methdods to
> >> >> > interfaces/abstract
> >> >> > classes. If we add an abstract method to Searcher with the new
> >> >> > boolean,
> >> >> > we're breaking back-compat.
> >> >>
> >> >> Right, though I think on a case by case basis we are in fact willing
> >> >> to break this, because the back-compat policy is not set in stone.
>  EG
> >> >> we've added new abstract methods to Searcher.
> >> >>
> >> >> Still, I think for 2.9 we have to migrate away from all interfaces.
> >> >> EG we need a dedicated issue w/ migration patch, to move away from
> >> >> Searchable.
> >> >>
> >> >> > Those specific problems (scoring when sorting) came into play only
> >> >> > since
> >> >> > the
> >> >> > introducation of the "fast and easy" search methods (which if you
> >> >> > look
> >> >> > at
> >> >> > their signature - they are not so fast and easy anymore). If we had
> >> >> > just
> >> >> > search(Collector, Query) (and maybe a couple other variants which
> >> >> > need
> >> >> > to
> >> >> > take into account more than just Collector and Query) you won't
> have
> >> >> > that
> >> >> > problem.
> >> >>
> >> >> But I think providing the sugar methods (that create TSDC or TFDC) is
> >> >> important:
> >> >>
> >> >>  search(Query query, int topN)
> >> >>
> >> >>  search(Query query, int topN, Sort sort)
> >> >>
> >> >> Simple things should be simple; complex things should be possible.
> >> >>
> >> >> It's just that that 2nd method should by default do no scoring.
>  Maybe
> >> >> we could simply consider changing that default (w/o adding the new
> >> >> API) for 2.9?
> >> >>
> >> >> > A reviewer, or anyone else will be required to first create a
> >> >> > Collector.
> >> >> > They read somewhere that TFC is used for sorting and that it has a
> >> >> > bunch
> >> >> > of
> >> >> > static create() methods. If they don't read it, they at least see a
> >> >> > sample
> >> >> > somewhere. So they create a TFC and maybe they see a couple of
> >> >> > completions
> >> >> > to create or not, but at least the changes are local to TFC. We can
> >> >> > add
> >> >> > more
> >> >> > create() variants to TFC w/o breaking back-compat, because TFC is
> not
> >> >> > extandable.
> >> >>
> >> >> Sure, if we had no sugar methods then any wanting to do a search
> would
> >> >> be forced to be fully explicit.  But the power of good defaults is
> >> >> you're not forced to go and make a bunch of decisions on settings
> that
> >> >> have natural defaults.
> >> >>
> >> >> > Coosing the defaults of each create() is bound to whether we want
> the
> >> >> > defaults to always reflect the best usage (which I prefer). At
> least
> >> >> > in
> >> >> > the
> >> >> > scoring example, I was under the impression we keep scoring for the
> >> >> > sake
> >> >> > of
> >> >> > back-compat, even if by changing it, it means nothing too bad will
> >> >> > happen
> >> >> > (we all kind of agree that scoring when sorting is useless, but
> >> >> > because
> >> >> > of
> >> >> > our back-compat policy we can't change it). I think there Grant's
> >> >> > proposal
> >> >> > to decide on a case-by-case basis would have eliminated scoring
> when
> >> >> > sorting
> >> >> > by default.
> >> >>
> >> >> Right, when sorting by field we should not score, by default.  I'm
> >> >> tempted to simply make that change by default for 2.9, now.  When
> >> >> compared to, say, changing IndexReader.open to return a readOnly
> >> >> reader by default, which I think would mess up alot of apps, I think
> >> >> not scoring by default when sorting by field will have much less of
> an
> >> >> impact.
> >> >>
> >> >> Mike
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >> >>
> >> >
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

Actually, under LUCENE-1601, what more was there to do besides turning
off scoring when sorting by field, by default?

Is there an issue for adding & mating Scorer.scoresDocsInOrder &
Collector.acceptsDocsOutOfOrder?

Mike

On Sun, May 24, 2009 at 3:22 PM, Shai Erera <se...@gmail.com> wrote:
> I created LUCENE-1601 for that purpose with a fix-version 3.0. I noticed you
> already opened another issue for the scoring only. So we should remove it
> from there (note that there's a TODO in the code, if you plan to change it
> in the new issue you opened). 1601 will still handle the new method added to
> Searcher (and I think there were some other TODOs in the code too).
>
> On Sun, May 24, 2009 at 3:48 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>>
>> I'll open an issue for this and we can discuss under there.
>>
>> And I still need to open issues for the other "change defaluts" in my
>> original email.
>>
>> Mike
>>
>> On Sun, May 24, 2009 at 8:11 AM, Shai Erera <se...@gmail.com> wrote:
>> >> I'm tempted to simply make that change by default for 2.9, now.
>> >
>> > Agree !
>> >
>> > Shai
>> >
>> > On Sun, May 24, 2009 at 1:28 PM, Michael McCandless
>> > <lu...@mikemccandless.com> wrote:
>> >>
>> >> On Sun, May 24, 2009 at 2:20 AM, Shai Erera <se...@gmail.com> wrote:
>> >> > One thing I don't fully understand about actsAsVersion (and I know it
>> >> > was
>> >> > said that we may want to drop that approach) - for how long does it
>> >> > stay? I
>> >> > mean, let's take the invalidAcronym. It is a change in back-compat,
>> >> > yes.
>> >> > But
>> >> > for how long are we expected to support it? And if we decide to
>> >> > support
>> >> > it
>> >> > for one minor release, or even one major release, will that ctor be
>> >> > deprecated? (I think it must be deprecated ...)
>> >>
>> >> Well, it's pretty clear actsAsVersion in any form (global static,
>> >> magically stored in index, passed in to ctors of those classes that
>> >> wanted to change settings) has too many objections, so this question
>> >> is somewhat moot.  (Yet, if I had to guess, I think we'd want to
>> >> support it for longer than 1 minor release, especially for settings
>> >> that impact how your index is created).
>> >>
>> >> Forcing a decision on upgrading, by deprecating the old API, is I
>> >> think an adequate workaround in most cases.  New users would not use
>> >> the deprecated API, and the javadocs would strongly call out which
>> >> default is preferred.  Old users would see nothing change, except new
>> >> deprecations, and when they go to fix the deprecations they'd see what
>> >> setting to use to remain back-compat, and also realize what they are
>> >> foregoing by doing so.
>> >>
>> >> > Also, Mike - you suggested coming up with newer names to methods to
>> >> > reflect
>> >> > new features (such as a boolean saying whether to score when you
>> >> > sort).
>> >> > This
>> >> > is strongly related to our ability to add methdods to
>> >> > interfaces/abstract
>> >> > classes. If we add an abstract method to Searcher with the new
>> >> > boolean,
>> >> > we're breaking back-compat.
>> >>
>> >> Right, though I think on a case by case basis we are in fact willing
>> >> to break this, because the back-compat policy is not set in stone.  EG
>> >> we've added new abstract methods to Searcher.
>> >>
>> >> Still, I think for 2.9 we have to migrate away from all interfaces.
>> >> EG we need a dedicated issue w/ migration patch, to move away from
>> >> Searchable.
>> >>
>> >> > Those specific problems (scoring when sorting) came into play only
>> >> > since
>> >> > the
>> >> > introducation of the "fast and easy" search methods (which if you
>> >> > look
>> >> > at
>> >> > their signature - they are not so fast and easy anymore). If we had
>> >> > just
>> >> > search(Collector, Query) (and maybe a couple other variants which
>> >> > need
>> >> > to
>> >> > take into account more than just Collector and Query) you won't have
>> >> > that
>> >> > problem.
>> >>
>> >> But I think providing the sugar methods (that create TSDC or TFDC) is
>> >> important:
>> >>
>> >>  search(Query query, int topN)
>> >>
>> >>  search(Query query, int topN, Sort sort)
>> >>
>> >> Simple things should be simple; complex things should be possible.
>> >>
>> >> It's just that that 2nd method should by default do no scoring.  Maybe
>> >> we could simply consider changing that default (w/o adding the new
>> >> API) for 2.9?
>> >>
>> >> > A reviewer, or anyone else will be required to first create a
>> >> > Collector.
>> >> > They read somewhere that TFC is used for sorting and that it has a
>> >> > bunch
>> >> > of
>> >> > static create() methods. If they don't read it, they at least see a
>> >> > sample
>> >> > somewhere. So they create a TFC and maybe they see a couple of
>> >> > completions
>> >> > to create or not, but at least the changes are local to TFC. We can
>> >> > add
>> >> > more
>> >> > create() variants to TFC w/o breaking back-compat, because TFC is not
>> >> > extandable.
>> >>
>> >> Sure, if we had no sugar methods then any wanting to do a search would
>> >> be forced to be fully explicit.  But the power of good defaults is
>> >> you're not forced to go and make a bunch of decisions on settings that
>> >> have natural defaults.
>> >>
>> >> > Coosing the defaults of each create() is bound to whether we want the
>> >> > defaults to always reflect the best usage (which I prefer). At least
>> >> > in
>> >> > the
>> >> > scoring example, I was under the impression we keep scoring for the
>> >> > sake
>> >> > of
>> >> > back-compat, even if by changing it, it means nothing too bad will
>> >> > happen
>> >> > (we all kind of agree that scoring when sorting is useless, but
>> >> > because
>> >> > of
>> >> > our back-compat policy we can't change it). I think there Grant's
>> >> > proposal
>> >> > to decide on a case-by-case basis would have eliminated scoring when
>> >> > sorting
>> >> > by default.
>> >>
>> >> Right, when sorting by field we should not score, by default.  I'm
>> >> tempted to simply make that change by default for 2.9, now.  When
>> >> compared to, say, changing IndexReader.open to return a readOnly
>> >> reader by default, which I think would mess up alot of apps, I think
>> >> not scoring by default when sorting by field will have much less of an
>> >> impact.
>> >>
>> >> Mike
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >>
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Shai Erera <se...@gmail.com>.

I created LUCENE-1601 for that purpose with a fix-version 3.0. I noticed you
already opened another issue for the scoring only. So we should remove it
from there (note that there's a TODO in the code, if you plan to change it
in the new issue you opened). 1601 will still handle the new method added to
Searcher (and I think there were some other TODOs in the code too).

On Sun, May 24, 2009 at 3:48 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> I'll open an issue for this and we can discuss under there.
>
> And I still need to open issues for the other "change defaluts" in my
> original email.
>
> Mike
>
> On Sun, May 24, 2009 at 8:11 AM, Shai Erera <se...@gmail.com> wrote:
> >> I'm tempted to simply make that change by default for 2.9, now.
> >
> > Agree !
> >
> > Shai
> >
> > On Sun, May 24, 2009 at 1:28 PM, Michael McCandless
> > <lu...@mikemccandless.com> wrote:
> >>
> >> On Sun, May 24, 2009 at 2:20 AM, Shai Erera <se...@gmail.com> wrote:
> >> > One thing I don't fully understand about actsAsVersion (and I know it
> >> > was
> >> > said that we may want to drop that approach) - for how long does it
> >> > stay? I
> >> > mean, let's take the invalidAcronym. It is a change in back-compat,
> yes.
> >> > But
> >> > for how long are we expected to support it? And if we decide to
> support
> >> > it
> >> > for one minor release, or even one major release, will that ctor be
> >> > deprecated? (I think it must be deprecated ...)
> >>
> >> Well, it's pretty clear actsAsVersion in any form (global static,
> >> magically stored in index, passed in to ctors of those classes that
> >> wanted to change settings) has too many objections, so this question
> >> is somewhat moot.  (Yet, if I had to guess, I think we'd want to
> >> support it for longer than 1 minor release, especially for settings
> >> that impact how your index is created).
> >>
> >> Forcing a decision on upgrading, by deprecating the old API, is I
> >> think an adequate workaround in most cases.  New users would not use
> >> the deprecated API, and the javadocs would strongly call out which
> >> default is preferred.  Old users would see nothing change, except new
> >> deprecations, and when they go to fix the deprecations they'd see what
> >> setting to use to remain back-compat, and also realize what they are
> >> foregoing by doing so.
> >>
> >> > Also, Mike - you suggested coming up with newer names to methods to
> >> > reflect
> >> > new features (such as a boolean saying whether to score when you
> sort).
> >> > This
> >> > is strongly related to our ability to add methdods to
> >> > interfaces/abstract
> >> > classes. If we add an abstract method to Searcher with the new
> boolean,
> >> > we're breaking back-compat.
> >>
> >> Right, though I think on a case by case basis we are in fact willing
> >> to break this, because the back-compat policy is not set in stone.  EG
> >> we've added new abstract methods to Searcher.
> >>
> >> Still, I think for 2.9 we have to migrate away from all interfaces.
> >> EG we need a dedicated issue w/ migration patch, to move away from
> >> Searchable.
> >>
> >> > Those specific problems (scoring when sorting) came into play only
> since
> >> > the
> >> > introducation of the "fast and easy" search methods (which if you look
> >> > at
> >> > their signature - they are not so fast and easy anymore). If we had
> just
> >> > search(Collector, Query) (and maybe a couple other variants which need
> >> > to
> >> > take into account more than just Collector and Query) you won't have
> >> > that
> >> > problem.
> >>
> >> But I think providing the sugar methods (that create TSDC or TFDC) is
> >> important:
> >>
> >>  search(Query query, int topN)
> >>
> >>  search(Query query, int topN, Sort sort)
> >>
> >> Simple things should be simple; complex things should be possible.
> >>
> >> It's just that that 2nd method should by default do no scoring.  Maybe
> >> we could simply consider changing that default (w/o adding the new
> >> API) for 2.9?
> >>
> >> > A reviewer, or anyone else will be required to first create a
> Collector.
> >> > They read somewhere that TFC is used for sorting and that it has a
> bunch
> >> > of
> >> > static create() methods. If they don't read it, they at least see a
> >> > sample
> >> > somewhere. So they create a TFC and maybe they see a couple of
> >> > completions
> >> > to create or not, but at least the changes are local to TFC. We can
> add
> >> > more
> >> > create() variants to TFC w/o breaking back-compat, because TFC is not
> >> > extandable.
> >>
> >> Sure, if we had no sugar methods then any wanting to do a search would
> >> be forced to be fully explicit.  But the power of good defaults is
> >> you're not forced to go and make a bunch of decisions on settings that
> >> have natural defaults.
> >>
> >> > Coosing the defaults of each create() is bound to whether we want the
> >> > defaults to always reflect the best usage (which I prefer). At least
> in
> >> > the
> >> > scoring example, I was under the impression we keep scoring for the
> sake
> >> > of
> >> > back-compat, even if by changing it, it means nothing too bad will
> >> > happen
> >> > (we all kind of agree that scoring when sorting is useless, but
> because
> >> > of
> >> > our back-compat policy we can't change it). I think there Grant's
> >> > proposal
> >> > to decide on a case-by-case basis would have eliminated scoring when
> >> > sorting
> >> > by default.
> >>
> >> Right, when sorting by field we should not score, by default.  I'm
> >> tempted to simply make that change by default for 2.9, now.  When
> >> compared to, say, changing IndexReader.open to return a readOnly
> >> reader by default, which I think would mess up alot of apps, I think
> >> not scoring by default when sorting by field will have much less of an
> >> impact.
> >>
> >> Mike
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

I'll open an issue for this and we can discuss under there.

And I still need to open issues for the other "change defaluts" in my
original email.

Mike

On Sun, May 24, 2009 at 8:11 AM, Shai Erera <se...@gmail.com> wrote:
>> I'm tempted to simply make that change by default for 2.9, now.
>
> Agree !
>
> Shai
>
> On Sun, May 24, 2009 at 1:28 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>>
>> On Sun, May 24, 2009 at 2:20 AM, Shai Erera <se...@gmail.com> wrote:
>> > One thing I don't fully understand about actsAsVersion (and I know it
>> > was
>> > said that we may want to drop that approach) - for how long does it
>> > stay? I
>> > mean, let's take the invalidAcronym. It is a change in back-compat, yes.
>> > But
>> > for how long are we expected to support it? And if we decide to support
>> > it
>> > for one minor release, or even one major release, will that ctor be
>> > deprecated? (I think it must be deprecated ...)
>>
>> Well, it's pretty clear actsAsVersion in any form (global static,
>> magically stored in index, passed in to ctors of those classes that
>> wanted to change settings) has too many objections, so this question
>> is somewhat moot.  (Yet, if I had to guess, I think we'd want to
>> support it for longer than 1 minor release, especially for settings
>> that impact how your index is created).
>>
>> Forcing a decision on upgrading, by deprecating the old API, is I
>> think an adequate workaround in most cases.  New users would not use
>> the deprecated API, and the javadocs would strongly call out which
>> default is preferred.  Old users would see nothing change, except new
>> deprecations, and when they go to fix the deprecations they'd see what
>> setting to use to remain back-compat, and also realize what they are
>> foregoing by doing so.
>>
>> > Also, Mike - you suggested coming up with newer names to methods to
>> > reflect
>> > new features (such as a boolean saying whether to score when you sort).
>> > This
>> > is strongly related to our ability to add methdods to
>> > interfaces/abstract
>> > classes. If we add an abstract method to Searcher with the new boolean,
>> > we're breaking back-compat.
>>
>> Right, though I think on a case by case basis we are in fact willing
>> to break this, because the back-compat policy is not set in stone.  EG
>> we've added new abstract methods to Searcher.
>>
>> Still, I think for 2.9 we have to migrate away from all interfaces.
>> EG we need a dedicated issue w/ migration patch, to move away from
>> Searchable.
>>
>> > Those specific problems (scoring when sorting) came into play only since
>> > the
>> > introducation of the "fast and easy" search methods (which if you look
>> > at
>> > their signature - they are not so fast and easy anymore). If we had just
>> > search(Collector, Query) (and maybe a couple other variants which need
>> > to
>> > take into account more than just Collector and Query) you won't have
>> > that
>> > problem.
>>
>> But I think providing the sugar methods (that create TSDC or TFDC) is
>> important:
>>
>>  search(Query query, int topN)
>>
>>  search(Query query, int topN, Sort sort)
>>
>> Simple things should be simple; complex things should be possible.
>>
>> It's just that that 2nd method should by default do no scoring.  Maybe
>> we could simply consider changing that default (w/o adding the new
>> API) for 2.9?
>>
>> > A reviewer, or anyone else will be required to first create a Collector.
>> > They read somewhere that TFC is used for sorting and that it has a bunch
>> > of
>> > static create() methods. If they don't read it, they at least see a
>> > sample
>> > somewhere. So they create a TFC and maybe they see a couple of
>> > completions
>> > to create or not, but at least the changes are local to TFC. We can add
>> > more
>> > create() variants to TFC w/o breaking back-compat, because TFC is not
>> > extandable.
>>
>> Sure, if we had no sugar methods then any wanting to do a search would
>> be forced to be fully explicit.  But the power of good defaults is
>> you're not forced to go and make a bunch of decisions on settings that
>> have natural defaults.
>>
>> > Coosing the defaults of each create() is bound to whether we want the
>> > defaults to always reflect the best usage (which I prefer). At least in
>> > the
>> > scoring example, I was under the impression we keep scoring for the sake
>> > of
>> > back-compat, even if by changing it, it means nothing too bad will
>> > happen
>> > (we all kind of agree that scoring when sorting is useless, but because
>> > of
>> > our back-compat policy we can't change it). I think there Grant's
>> > proposal
>> > to decide on a case-by-case basis would have eliminated scoring when
>> > sorting
>> > by default.
>>
>> Right, when sorting by field we should not score, by default.  I'm
>> tempted to simply make that change by default for 2.9, now.  When
>> compared to, say, changing IndexReader.open to return a readOnly
>> reader by default, which I think would mess up alot of apps, I think
>> not scoring by default when sorting by field will have much less of an
>> impact.
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Shai Erera <se...@gmail.com>.

>
> I'm tempted to simply make that change by default for 2.9, now.
>

Agree !

Shai

On Sun, May 24, 2009 at 1:28 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Sun, May 24, 2009 at 2:20 AM, Shai Erera <se...@gmail.com> wrote:
> > One thing I don't fully understand about actsAsVersion (and I know it was
> > said that we may want to drop that approach) - for how long does it stay?
> I
> > mean, let's take the invalidAcronym. It is a change in back-compat, yes.
> But
> > for how long are we expected to support it? And if we decide to support
> it
> > for one minor release, or even one major release, will that ctor be
> > deprecated? (I think it must be deprecated ...)
>
> Well, it's pretty clear actsAsVersion in any form (global static,
> magically stored in index, passed in to ctors of those classes that
> wanted to change settings) has too many objections, so this question
> is somewhat moot.  (Yet, if I had to guess, I think we'd want to
> support it for longer than 1 minor release, especially for settings
> that impact how your index is created).
>
> Forcing a decision on upgrading, by deprecating the old API, is I
> think an adequate workaround in most cases.  New users would not use
> the deprecated API, and the javadocs would strongly call out which
> default is preferred.  Old users would see nothing change, except new
> deprecations, and when they go to fix the deprecations they'd see what
> setting to use to remain back-compat, and also realize what they are
> foregoing by doing so.
>
> > Also, Mike - you suggested coming up with newer names to methods to
> reflect
> > new features (such as a boolean saying whether to score when you sort).
> This
> > is strongly related to our ability to add methdods to interfaces/abstract
> > classes. If we add an abstract method to Searcher with the new boolean,
> > we're breaking back-compat.
>
> Right, though I think on a case by case basis we are in fact willing
> to break this, because the back-compat policy is not set in stone.  EG
> we've added new abstract methods to Searcher.
>
> Still, I think for 2.9 we have to migrate away from all interfaces.
> EG we need a dedicated issue w/ migration patch, to move away from
> Searchable.
>
> > Those specific problems (scoring when sorting) came into play only since
> the
> > introducation of the "fast and easy" search methods (which if you look at
> > their signature - they are not so fast and easy anymore). If we had just
> > search(Collector, Query) (and maybe a couple other variants which need to
> > take into account more than just Collector and Query) you won't have that
> > problem.
>
> But I think providing the sugar methods (that create TSDC or TFDC) is
> important:
>
>  search(Query query, int topN)
>
>  search(Query query, int topN, Sort sort)
>
> Simple things should be simple; complex things should be possible.
>
> It's just that that 2nd method should by default do no scoring.  Maybe
> we could simply consider changing that default (w/o adding the new
> API) for 2.9?
>
> > A reviewer, or anyone else will be required to first create a Collector.
> > They read somewhere that TFC is used for sorting and that it has a bunch
> of
> > static create() methods. If they don't read it, they at least see a
> sample
> > somewhere. So they create a TFC and maybe they see a couple of
> completions
> > to create or not, but at least the changes are local to TFC. We can add
> more
> > create() variants to TFC w/o breaking back-compat, because TFC is not
> > extandable.
>
> Sure, if we had no sugar methods then any wanting to do a search would
> be forced to be fully explicit.  But the power of good defaults is
> you're not forced to go and make a bunch of decisions on settings that
> have natural defaults.
>
> > Coosing the defaults of each create() is bound to whether we want the
> > defaults to always reflect the best usage (which I prefer). At least in
> the
> > scoring example, I was under the impression we keep scoring for the sake
> of
> > back-compat, even if by changing it, it means nothing too bad will happen
> > (we all kind of agree that scoring when sorting is useless, but because
> of
> > our back-compat policy we can't change it). I think there Grant's
> proposal
> > to decide on a case-by-case basis would have eliminated scoring when
> sorting
> > by default.
>
> Right, when sorting by field we should not score, by default.  I'm
> tempted to simply make that change by default for 2.9, now.  When
> compared to, say, changing IndexReader.open to return a readOnly
> reader by default, which I think would mess up alot of apps, I think
> not scoring by default when sorting by field will have much less of an
> impact.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Sun, May 24, 2009 at 2:20 AM, Shai Erera <se...@gmail.com> wrote:
> One thing I don't fully understand about actsAsVersion (and I know it was
> said that we may want to drop that approach) - for how long does it stay? I
> mean, let's take the invalidAcronym. It is a change in back-compat, yes. But
> for how long are we expected to support it? And if we decide to support it
> for one minor release, or even one major release, will that ctor be
> deprecated? (I think it must be deprecated ...)

Well, it's pretty clear actsAsVersion in any form (global static,
magically stored in index, passed in to ctors of those classes that
wanted to change settings) has too many objections, so this question
is somewhat moot.  (Yet, if I had to guess, I think we'd want to
support it for longer than 1 minor release, especially for settings
that impact how your index is created).

Forcing a decision on upgrading, by deprecating the old API, is I
think an adequate workaround in most cases.  New users would not use
the deprecated API, and the javadocs would strongly call out which
default is preferred.  Old users would see nothing change, except new
deprecations, and when they go to fix the deprecations they'd see what
setting to use to remain back-compat, and also realize what they are
foregoing by doing so.

> Also, Mike - you suggested coming up with newer names to methods to reflect
> new features (such as a boolean saying whether to score when you sort). This
> is strongly related to our ability to add methdods to interfaces/abstract
> classes. If we add an abstract method to Searcher with the new boolean,
> we're breaking back-compat.

Right, though I think on a case by case basis we are in fact willing
to break this, because the back-compat policy is not set in stone.  EG
we've added new abstract methods to Searcher.

Still, I think for 2.9 we have to migrate away from all interfaces.
EG we need a dedicated issue w/ migration patch, to move away from
Searchable.

> Those specific problems (scoring when sorting) came into play only since the
> introducation of the "fast and easy" search methods (which if you look at
> their signature - they are not so fast and easy anymore). If we had just
> search(Collector, Query) (and maybe a couple other variants which need to
> take into account more than just Collector and Query) you won't have that
> problem.

But I think providing the sugar methods (that create TSDC or TFDC) is important:

  search(Query query, int topN)

  search(Query query, int topN, Sort sort)

Simple things should be simple; complex things should be possible.

It's just that that 2nd method should by default do no scoring.  Maybe
we could simply consider changing that default (w/o adding the new
API) for 2.9?

> A reviewer, or anyone else will be required to first create a Collector.
> They read somewhere that TFC is used for sorting and that it has a bunch of
> static create() methods. If they don't read it, they at least see a sample
> somewhere. So they create a TFC and maybe they see a couple of completions
> to create or not, but at least the changes are local to TFC. We can add more
> create() variants to TFC w/o breaking back-compat, because TFC is not
> extandable.

Sure, if we had no sugar methods then any wanting to do a search would
be forced to be fully explicit.  But the power of good defaults is
you're not forced to go and make a bunch of decisions on settings that
have natural defaults.

> Coosing the defaults of each create() is bound to whether we want the
> defaults to always reflect the best usage (which I prefer). At least in the
> scoring example, I was under the impression we keep scoring for the sake of
> back-compat, even if by changing it, it means nothing too bad will happen
> (we all kind of agree that scoring when sorting is useless, but because of
> our back-compat policy we can't change it). I think there Grant's proposal
> to decide on a case-by-case basis would have eliminated scoring when sorting
> by default.

Right, when sorting by field we should not score, by default.  I'm
tempted to simply make that change by default for 2.9, now.  When
compared to, say, changing IndexReader.open to return a readOnly
reader by default, which I think would mess up alot of apps, I think
not scoring by default when sorting by field will have much less of an
impact.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Shai Erera <se...@gmail.com>.

One thing I don't fully understand about actsAsVersion (and I know it was
said that we may want to drop that approach) - for how long does it stay? I
mean, let's take the invalidAcronym. It is a change in back-compat, yes. But
for how long are we expected to support it? And if we decide to support it
for one minor release, or even one major release, will that ctor be
deprecated? (I think it must be deprecated ...)

Also, Mike - you suggested coming up with newer names to methods to reflect
new features (such as a boolean saying whether to score when you sort). This
is strongly related to our ability to add methdods to interfaces/abstract
classes. If we add an abstract method to Searcher with the new boolean,
we're breaking back-compat.

Those specific problems (scoring when sorting) came into play only since the
introducation of the "fast and easy" search methods (which if you look at
their signature - they are not so fast and easy anymore). If we had just
search(Collector, Query) (and maybe a couple other variants which need to
take into account more than just Collector and Query) you won't have that
problem.

A reviewer, or anyone else will be required to first create a Collector.
They read somewhere that TFC is used for sorting and that it has a bunch of
static create() methods. If they don't read it, they at least see a sample
somewhere. So they create a TFC and maybe they see a couple of completions
to create or not, but at least the changes are local to TFC. We can add more
create() variants to TFC w/o breaking back-compat, because TFC is not
extandable.

Coosing the defaults of each create() is bound to whether we want the
defaults to always reflect the best usage (which I prefer). At least in the
scoring example, I was under the impression we keep scoring for the sake of
back-compat, even if by changing it, it means nothing too bad will happen
(we all kind of agree that scoring when sorting is useless, but because of
our back-compat policy we can't change it). I think there Grant's proposal
to decide on a case-by-case basis would have eliminated scoring when sorting
by default.

Shai

On Fri, May 22, 2009 at 11:14 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Fri, May 22, 2009 at 3:37 PM, DM Smith <dm...@gmail.com> wrote:
>
> > So, what is it that they use that leads to such unfavorable results?
>
> I think it's simply that they take each search engine, get it to index
> their collection in the most obvious way, perhaps having read a
> tutorial somewhere, and test that.  I'm guessing they don't spend much
> time tuning any of the search engines for what they are testing.  So
> those with the best defaults make the best impression.  First
> impressions count :)
>
> So eg they don't turn off CFS, don't increase IW's RAM buffer, don't
> turn off scoring when sorting by field, fail to omitTFAP when testing
> "pure boolean" searching, etc.
>
> These tunings are well known to all of us, but to 95% of Lucene users,
> including your casual reviewer, they aren't.
>
> I expect non-reviewers do the same, when they want try out different
> search engines.  I think it's the vast minority of people who actually
> come out to java-user to ask for help, and I bet most "potential new
> users" never discover the tuning tips on the wiki.
>
> (And: I fully agree, said reviewer and said new user *should* to do
> their homework and tune each engine to their fullest; likewise,
> readers of such reviews *should* scrutinize whether the testing was
> fair; yet typically they don't).
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Fri, May 22, 2009 at 3:37 PM, DM Smith <dm...@gmail.com> wrote:

> So, what is it that they use that leads to such unfavorable results?

I think it's simply that they take each search engine, get it to index
their collection in the most obvious way, perhaps having read a
tutorial somewhere, and test that.  I'm guessing they don't spend much
time tuning any of the search engines for what they are testing.  So
those with the best defaults make the best impression.  First
impressions count :)

So eg they don't turn off CFS, don't increase IW's RAM buffer, don't
turn off scoring when sorting by field, fail to omitTFAP when testing
"pure boolean" searching, etc.

These tunings are well known to all of us, but to 95% of Lucene users,
including your casual reviewer, they aren't.

I expect non-reviewers do the same, when they want try out different
search engines.  I think it's the vast minority of people who actually
come out to java-user to ask for help, and I bet most "potential new
users" never discover the tuning tips on the wiki.

(And: I fully agree, said reviewer and said new user *should* to do
their homework and tune each engine to their fullest; likewise,
readers of such reviews *should* scrutinize whether the testing was
fair; yet typically they don't).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by DM Smith <dm...@gmail.com>.

Michael McCandless wrote:
> Well... I would expect & hope Lucene's adoption is growing with time,
> so the number of new users should increase on each release.  For a
> healthy project that's relatively young compared to its potential user
> base, that growth should be exponential.
>
> And, I'd expect the vast majority of old users don't ever upgrade.
>
> Furthermore, when a reviewer (typically a "new user") tests Lucene
> against other search engines, and fails to check our Wiki for all the
> things we all know you have to do to get good search or indexing
> performance, and then reports in a well-read blog somewhere that
> Lucene's performance isn't great when compared to other search
> engines, and lots of other people read that, cite it, share it, etc.,
> those people are less inclined to try Lucene.  This then stunts
> Lucene's growth.
>   
I would think a reviewer would have to read something other than just 
javadocs to figure out how to set up Lucene. While the javadocs are 
good, and getting better, I did not find them helpful at first. The 
class-at-a-time approach to documentation is too fragmented for me. So, 
what is it that they use that leads to such unfavorable results?

> Yes, we all sit here and say "well that's not a fair review because
> you didn't properly tune Lucene", yet, this kind of thing happens all
> the time.  If Lucene had better defaults out of the box it'd reduce
> how often that happens.
>
> Mike
>
> On Fri, May 22, 2009 at 2:49 PM, DM Smith <dm...@gmail.com> wrote:
>   
>> Michael McCandless wrote:
>>     
>>> On Fri, May 22, 2009 at 2:27 PM, DM Smith <dm...@gmail.com> wrote:
>>>
>>>       
>>>> Marvin Humphrey wrote:
>>>>
>>>>         
>>>>>> I feel the opposite: I'd like new users to see improvements by
>>>>>> default, and users that require strict back-compate to ask for that.
>>>>>>
>>>>>>
>>>>>>             
>>>>> By "strict back-compat", do you mean "people who would like their search
>>>>> app to
>>>>> not fail silently"? ;)  A "new user" who follows your advice...
>>>>>
>>>>>  // haha stupid noob   StandardAnalyzer analyzer = new
>>>>> StandardAnalyzer(Versons.LATEST);
>>>>>
>>>>> ... is going to get screwed when the default tokenization behavior
>>>>> changes.
>>>>> And it would be much worse if we follow my preference for making the arg
>>>>> optional without following my preference for keeping defaults intact:
>>>>>
>>>>>  // haha eat it luser   StandardAnalyzer analyzer = new
>>>>> StandardAnalyzer();
>>>>>
>>>>> It's either make the arg mandatory when changing default behavior and
>>>>> recommend that new users pass a fixed argument, or make it optional but
>>>>> keep
>>>>> defaults intact between major releases.
>>>>>
>>>>>           
>>>> I think I see your point: A new user is such only for the first release
>>>> that
>>>> they use Lucene. For a first use, there is no backward compatibility
>>>> problem. On the use of a subsequent release, their code still gets the
>>>> latest and greatest and now by the choice they were guided to make, they
>>>> may
>>>> have broken backward compatibility.
>>>>
>>>> So for any user, the only save, thus acceptable use is to never have
>>>> Versions.LATEST, but only a specific version.
>>>>
>>>>         
>>> Right, we would have to not provide Versions.LATEST, ie if you want
>>> latest, you'd pick Versions.LUCENE_29 (in 2.9).
>>>       
>> Why go to all this trouble for a new user?
>>
>> Let's pretend that there are 1,000 new users every release. After 12
>> releases, there are still only 1000 new users but now 11000 old users.
>>
>> How does it help an old user?
>>
>> Those 11000 old users now have to update their code to Versions.Lucene_301
>> (or whatever the latest is) to get the latest changes, but they are also
>> going to have to understand what that means and figure out what parts of
>> their application now behave in a broken manner. Where are they to go to
>> find out that info? CHANGES.txt?
>>
>> When I was a new user, I had to look at example code, read faqs, wiki,
>> javadoc, java-users .... It was a learning curve, fortunately not steep.
>>
>> Don't those resources need to be maintained so as to match the
>> best/recommended practices? Can't that be the place where new users are
>> informed?
>>
>> -- DM
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>     
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

Well... I would expect & hope Lucene's adoption is growing with time,
so the number of new users should increase on each release.  For a
healthy project that's relatively young compared to its potential user
base, that growth should be exponential.

And, I'd expect the vast majority of old users don't ever upgrade.

Furthermore, when a reviewer (typically a "new user") tests Lucene
against other search engines, and fails to check our Wiki for all the
things we all know you have to do to get good search or indexing
performance, and then reports in a well-read blog somewhere that
Lucene's performance isn't great when compared to other search
engines, and lots of other people read that, cite it, share it, etc.,
those people are less inclined to try Lucene.  This then stunts
Lucene's growth.

Yes, we all sit here and say "well that's not a fair review because
you didn't properly tune Lucene", yet, this kind of thing happens all
the time.  If Lucene had better defaults out of the box it'd reduce
how often that happens.

Mike

On Fri, May 22, 2009 at 2:49 PM, DM Smith <dm...@gmail.com> wrote:
> Michael McCandless wrote:
>>
>> On Fri, May 22, 2009 at 2:27 PM, DM Smith <dm...@gmail.com> wrote:
>>
>>>
>>> Marvin Humphrey wrote:
>>>
>>>>>
>>>>> I feel the opposite: I'd like new users to see improvements by
>>>>> default, and users that require strict back-compate to ask for that.
>>>>>
>>>>>
>>>>
>>>> By "strict back-compat", do you mean "people who would like their search
>>>> app to
>>>> not fail silently"? ;)  A "new user" who follows your advice...
>>>>
>>>>  // haha stupid noob   StandardAnalyzer analyzer = new
>>>> StandardAnalyzer(Versons.LATEST);
>>>>
>>>> ... is going to get screwed when the default tokenization behavior
>>>> changes.
>>>> And it would be much worse if we follow my preference for making the arg
>>>> optional without following my preference for keeping defaults intact:
>>>>
>>>>  // haha eat it luser   StandardAnalyzer analyzer = new
>>>> StandardAnalyzer();
>>>>
>>>> It's either make the arg mandatory when changing default behavior and
>>>> recommend that new users pass a fixed argument, or make it optional but
>>>> keep
>>>> defaults intact between major releases.
>>>>
>>>
>>> I think I see your point: A new user is such only for the first release
>>> that
>>> they use Lucene. For a first use, there is no backward compatibility
>>> problem. On the use of a subsequent release, their code still gets the
>>> latest and greatest and now by the choice they were guided to make, they
>>> may
>>> have broken backward compatibility.
>>>
>>> So for any user, the only save, thus acceptable use is to never have
>>> Versions.LATEST, but only a specific version.
>>>
>>
>> Right, we would have to not provide Versions.LATEST, ie if you want
>> latest, you'd pick Versions.LUCENE_29 (in 2.9).
>
> Why go to all this trouble for a new user?
>
> Let's pretend that there are 1,000 new users every release. After 12
> releases, there are still only 1000 new users but now 11000 old users.
>
> How does it help an old user?
>
> Those 11000 old users now have to update their code to Versions.Lucene_301
> (or whatever the latest is) to get the latest changes, but they are also
> going to have to understand what that means and figure out what parts of
> their application now behave in a broken manner. Where are they to go to
> find out that info? CHANGES.txt?
>
> When I was a new user, I had to look at example code, read faqs, wiki,
> javadoc, java-users .... It was a learning curve, fortunately not steep.
>
> Don't those resources need to be maintained so as to match the
> best/recommended practices? Can't that be the place where new users are
> informed?
>
> -- DM
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by DM Smith <dm...@gmail.com>.

Michael McCandless wrote:
> On Fri, May 22, 2009 at 2:27 PM, DM Smith <dm...@gmail.com> wrote:
>   
>> Marvin Humphrey wrote:
>>     
>>>> I feel the opposite: I'd like new users to see improvements by
>>>> default, and users that require strict back-compate to ask for that.
>>>>
>>>>         
>>> By "strict back-compat", do you mean "people who would like their search
>>> app to
>>> not fail silently"? ;)  A "new user" who follows your advice...
>>>
>>>   // haha stupid noob   StandardAnalyzer analyzer = new
>>> StandardAnalyzer(Versons.LATEST);
>>>
>>> ... is going to get screwed when the default tokenization behavior
>>> changes.
>>> And it would be much worse if we follow my preference for making the arg
>>> optional without following my preference for keeping defaults intact:
>>>
>>>   // haha eat it luser   StandardAnalyzer analyzer = new
>>> StandardAnalyzer();
>>>
>>> It's either make the arg mandatory when changing default behavior and
>>> recommend that new users pass a fixed argument, or make it optional but
>>> keep
>>> defaults intact between major releases.
>>>       
>> I think I see your point: A new user is such only for the first release that
>> they use Lucene. For a first use, there is no backward compatibility
>> problem. On the use of a subsequent release, their code still gets the
>> latest and greatest and now by the choice they were guided to make, they may
>> have broken backward compatibility.
>>
>> So for any user, the only save, thus acceptable use is to never have
>> Versions.LATEST, but only a specific version.
>>     
>
> Right, we would have to not provide Versions.LATEST, ie if you want
> latest, you'd pick Versions.LUCENE_29 (in 2.9).

Why go to all this trouble for a new user?

Let's pretend that there are 1,000 new users every release. After 12 
releases, there are still only 1000 new users but now 11000 old users.

How does it help an old user?

Those 11000 old users now have to update their code to 
Versions.Lucene_301 (or whatever the latest is) to get the latest 
changes, but they are also going to have to understand what that means 
and figure out what parts of their application now behave in a broken 
manner. Where are they to go to find out that info? CHANGES.txt?

When I was a new user, I had to look at example code, read faqs, wiki, 
javadoc, java-users .... It was a learning curve, fortunately not steep.

Don't those resources need to be maintained so as to match the 
best/recommended practices? Can't that be the place where new users are 
informed?

-- DM

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Fri, May 22, 2009 at 2:27 PM, DM Smith <dm...@gmail.com> wrote:
> Marvin Humphrey wrote:
>>>
>>> I feel the opposite: I'd like new users to see improvements by
>>> default, and users that require strict back-compate to ask for that.
>>>
>>
>> By "strict back-compat", do you mean "people who would like their search
>> app to
>> not fail silently"? ;)  A "new user" who follows your advice...
>>
>>   // haha stupid noob   StandardAnalyzer analyzer = new
>> StandardAnalyzer(Versons.LATEST);
>>
>> ... is going to get screwed when the default tokenization behavior
>> changes.
>> And it would be much worse if we follow my preference for making the arg
>> optional without following my preference for keeping defaults intact:
>>
>>   // haha eat it luser   StandardAnalyzer analyzer = new
>> StandardAnalyzer();
>>
>> It's either make the arg mandatory when changing default behavior and
>> recommend that new users pass a fixed argument, or make it optional but
>> keep
>> defaults intact between major releases.
>
> I think I see your point: A new user is such only for the first release that
> they use Lucene. For a first use, there is no backward compatibility
> problem. On the use of a subsequent release, their code still gets the
> latest and greatest and now by the choice they were guided to make, they may
> have broken backward compatibility.
>
> So for any user, the only save, thus acceptable use is to never have
> Versions.LATEST, but only a specific version.

Right, we would have to not provide Versions.LATEST, ie if you want
latest, you'd pick Versions.LUCENE_29 (in 2.9).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by DM Smith <dm...@gmail.com>.

Marvin Humphrey wrote:
>> I feel the opposite: I'd like new users to see improvements by
>> default, and users that require strict back-compate to ask for that.
>>     
>
> By "strict back-compat", do you mean "people who would like their search app to
> not fail silently"? ;)  A "new user" who follows your advice...
>
>    // haha stupid noob 
>    StandardAnalyzer analyzer = new StandardAnalyzer(Versons.LATEST);
>
> ... is going to get screwed when the default tokenization behavior changes.
> And it would be much worse if we follow my preference for making the arg
> optional without following my preference for keeping defaults intact:
>
>    // haha eat it luser 
>    StandardAnalyzer analyzer = new StandardAnalyzer();
>
> It's either make the arg mandatory when changing default behavior and
> recommend that new users pass a fixed argument, or make it optional but keep
> defaults intact between major releases.
I think I see your point: A new user is such only for the first release 
that they use Lucene. For a first use, there is no backward 
compatibility problem. On the use of a subsequent release, their code 
still gets the latest and greatest and now by the choice they were 
guided to make, they may have broken backward compatibility.

So for any user, the only save, thus acceptable use is to never have 
Versions.LATEST, but only a specific version.

-- DM


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Marvin Humphrey <ma...@rectangular.com>.

> I feel the opposite: I'd like new users to see improvements by
> default, and users that require strict back-compate to ask for that.

By "strict back-compat", do you mean "people who would like their search app to
not fail silently"? ;)  A "new user" who follows your advice...

   // haha stupid noob 
   StandardAnalyzer analyzer = new StandardAnalyzer(Versons.LATEST);

... is going to get screwed when the default tokenization behavior changes.
And it would be much worse if we follow my preference for making the arg
optional without following my preference for keeping defaults intact:

   // haha eat it luser 
   StandardAnalyzer analyzer = new StandardAnalyzer();

It's either make the arg mandatory when changing default behavior and
recommend that new users pass a fixed argument, or make it optional but keep
defaults intact between major releases.

Marvin Humphrey


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Fri, May 22, 2009 at 12:37 PM, Marvin Humphrey
<ma...@rectangular.com> wrote:

> I still like per-class settings classes.  For instance, an IndexWriterSettings
> class which allows you to hide away all the tweaky stuff that's cluttering up
> the IndexWriter API.
>
>   IndexWriterSettings settings = new IndexWriterSettings("3.1");
>   IndexWriter writer = new IndexWriter("path/to/index", analyzer, settings);

Unfortunately, switching to a separate Settings class is a much bigger
project, than the other proposals.

With actsAsVersion per-class, we could add that arg only to the
classes that needed it.  But with Settings, I assume we'd have to go
whole hog and pull out all of IWs settings into this separate class.

There are tricky questions with Settings too.  EG when are you allowed
to change a setting?  Some settings in IW must be known in the ctor
(autoCommit) and can't be changed later.  Others need to propogate on
change, so we'd need some mechanism for IW to be notified when a
specific setting changed.

I agree Settings also buy us neat possibilities, eg we could make
different Settings for "good search relevance" vs "high search
throughpug" vs "high indexing throughput", etc., but this is way
beyond just making sure new users see the best of Lucene.  Using
settings for back compat seems like overkill.

I also see Settings as something of a distraction.  Ie we have lots of
neat features, ideas to work on in Lucene that improve its
functionality, where I'd rather see our effort spent.

> I also think that the argument should be optional rather than mandatory, and
> that defaults should remain stable between major releases.  In other words, to
> take advantage of improved defaults, you need to ask for them -- but new users
> don't have to think about such things during the initial learning phase.

I feel the opposite: I'd like new users to see improvements by
default, and users that require strict back-compate to ask for that.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Fri, May 22, 2009 at 11:53:02AM -0400, Michael McCandless wrote:

>   1. If we deprecate an API in the 2.1 release, we can remove it in
>      the next minor release (2.2).
> 
>   2. JAR drop-in-ability is only guaranteed on point releases (2.4.1
>      is a drop-in replacement to 2.4.0).  When switching to a new
>      minor release (2.1 -> 2.2) likely you'll need to recompile.

>   4. [Maybe?] Allow certain limited changes that will require source
>      code changes in your app on upgrading to a new minor release:
>      adding a new method to an interface, adding a new abstract method
>      to an abstract class, renaming of deprecated methods.

These make sense to me.  Catastrophic failure at compile time is vastly
easier to deal with than subtle failure at run time.

>   3. Default settings can change, but if the change is big enough (and
>      certainly if it will impact what's indexed or how searches find
>      docs/do scoring), we add a required "actsAsVersion" arg to the
>      ctor of the affected class.  New users get the latest & greatest,
>      and upgraded users keep their old defaults.

I still like per-class settings classes.  For instance, an IndexWriterSettings
class which allows you to hide away all the tweaky stuff that's cluttering up
the IndexWriter API.

   IndexWriterSettings settings = new IndexWriterSettings("3.1");
   IndexWriter writer = new IndexWriter("path/to/index", analyzer, settings);

I also think that the argument should be optional rather than mandatory, and
that defaults should remain stable between major releases.  In other words, to
take advantage of improved defaults, you need to ask for them -- but new users
don't have to think about such things during the initial learning phase.

This approach is reasonably close to how Architecture and IndexManager are
used to hide away settings for the KS/Lucy Indexer class.

Marvin Humphrey

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

So, iterating on the proposed changes to back-compat policy:

  1. If we deprecate an API in the 2.1 release, we can remove it in
     the next minor release (2.2).

  2. JAR drop-in-ability is only guaranteed on point releases (2.4.1
     is a drop-in replacement to 2.4.0).  When switching to a new
     minor release (2.1 -> 2.2) likely you'll need to recompile.

  3. Default settings can change, but if the change is big enough (and
     certainly if it will impact what's indexed or how searches find
     docs/do scoring), we add a required "actsAsVersion" arg to the
     ctor of the affected class.  New users get the latest & greatest,
     and upgraded users keep their old defaults.

  4. [Maybe?] Allow certain limited changes that will require source
     code changes in your app on upgrading to a new minor release:
     adding a new method to an interface, adding a new abstract method
     to an abstract class, renaming of deprecated methods.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Fri, May 22, 2009 at 10:40:03PM +0400, Earwin Burrfoot wrote:
> >> Custom analyzers.
> > No problem.
> How are they recorded in the index?

Analyzers must implement dump() and load(), which convert the Analyzer to/from
a JSON-izable data structure.  They end up as JSON in
index_dir/schema_NNN.json.

Custom subclasses must be loaded by whatever app wants to read the index,
naturally.

> >> Intentionally different analyzers for indexing and searching.
> > No problem.  That only makes sense in the context of QueryParser, and the KS
> > QueryParser allows you to supply an analyzer which overrides the Schema.
> But well, it differs from analyzer used for indexation in one or two
> options, and shares a heap of others.

A constructor argument solves that problem, doesn't it?  Am I missing
something?

> >> Using this analyzer without any index at all - like I do highlight on
> >> a separate machine to minimize GC pauses, or tag docs by running a
> >> heap of queries against MemoryIndex.
> > No problem.  Distribute a Schema subclass among several machines.
> You mean read an index on one machine, create Analyzer, serialize it
> and send over the wire to other machines? I hope that's either a joke
> or I misunderstood you.

Please.  

How did your Analyzer class get on the other machines?  Do the same thing with
your Schema subclass.

> Storing a list of stopwords in the index sounds fun. Storing a fat
> synonym/morphology dictionary while completely analogous, is no longer
> fun.

So, don't store that whole dictionary in the serialized Analyzer -- just store
a version number.  Make the synonym data class data.  

If it's reasonable to key multiple versions of the class data off of the
version number constructor argument, do that.  If not and an index was built
with an version of the Analyzer that is no longer supported, either throw an
exception or intentionally ignore the mismatch and serve screwed up search
results.  Your call.

Marvin Humphrey

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Fri, May 22, 2009 at 01:22:24PM -0400, Michael McCandless wrote:
> > Sounds like an argument for more frequent major releases.
> 
> Yeah.  Or "rebranding" what we now call minor as major releases, by
> changing our policy ;) 

Not sure how much of that is a jest, bug I don't think that's a good idea.  It
violates commonly held expectations about what constitutes a "minor release".

Of course, I'm not sure to what extent modified interfaces will surprise
people.  At least that's compile-time... but then it will make it harder for
multiple apps with Lucene depenencies to coexist.

> Will Lucy do scoring when sorting by field, by default?

Nope.  Why would we do that?  The only reason you're doing it in Lucene is
to preserve back compat, and Lucy doesn't have that constraint.

Marvin Humphrey

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Earwin Burrfoot <ea...@gmail.com>.

>> Custom analyzers.
> No problem.
How are they recorded in the index?

>> Several indexes using the same analyzer.
> No problem.  Only necessary if the analyzer is costly or has some esoteric
> need for shared state.  And possible via subclassing Schema or Analyzer.
It is.

>> Intentionally different analyzers for indexing and searching.
> No problem.  That only makes sense in the context of QueryParser, and the KS
> QueryParser allows you to supply an analyzer which overrides the Schema.
But well, it differs from analyzer used for indexation in one or two
options, and shares a heap of others.

>> Using this analyzer without any index at all - like I do highlight on
>> a separate machine to minimize GC pauses, or tag docs by running a
>> heap of queries against MemoryIndex.
> No problem.  Distribute a Schema subclass among several machines.
You mean read an index on one machine, create Analyzer, serialize it
and send over the wire to other machines? I hope that's either a joke
or I misunderstood you.

I'm not opposed to the idea itself. It's just that it should be a
layer over existing functionality and in no way something mandatory.
Storing a list of stopwords in the index sounds fun. Storing a fat
synonym/morphology dictionary while completely analogous, is no longer
fun.

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Fri, May 22, 2009 at 09:06:32PM +0400, Earwin Burrfoot wrote:
> > In KinoSearch SVN trunk, satellite classes like QueryParser and Highlighter
> > have to be passed a Schema, which contains all the Analyzers.  Analyzers
> > aren't satellite classes under this model -- they are a fixed property of a
> > FullTextType field spec.  Think of them as baked into an SQL field definition.
> >
> > You can create a Schema from scratch to pass to the QueryParser, but it's
> > easier to just get it from the Searcher.  Translating to Java...
> >
> >   Searcher searcher = new Searcher("/path/to/index");
> >   QueryParser qparser = new QueryParser(searcher.getSchema());
> >
> > I don't see how that's so different from getting an analyzer actsAsVersion
> > number from the index.
> >
> > Now, where stuff might start to get complicated is PerFieldAnalyzerWrapper...
> > is that where the sneakiness gets overwhelming?
> Some people can have setups more complex than that.
> Different analyzers per field.

Heh.  One of the primary rationales behind Schema was to tie individual
analyzers to specific fields.

> Custom analyzers.

No problem.

> Several indexes using the same analyzer.

No problem.  Only necessary if the analyzer is costly or has some esoteric
need for shared state.  And possible via subclassing Schema or Analyzer.

> Intentionally different analyzers for indexing and searching.

No problem.  That only makes sense in the context of QueryParser, and the KS
QueryParser allows you to supply an analyzer which overrides the Schema.

> Using this analyzer without any index at all - like I do highlight on
> a separate machine to minimize GC pauses, or tag docs by running a
> heap of queries against MemoryIndex.

No problem.  Distribute a Schema subclass among several machines.

These are all solved problems under the per-index field semantics serialized
Schema model.  That's why I said it was the "theoretical solution".

Marvin Humphrey


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

I'd like to do this for 2.9 :)  I'll open an issue...

(Yes this would just be for diagnostics).

Mike

On Fri, May 22, 2009 at 1:48 PM, DM Smith <dm...@gmail.com> wrote:
> Yonik Seeley wrote:
>>
>> On Fri, May 22, 2009 at 1:22 PM, Michael McCandless
>> <lu...@mikemccandless.com> wrote:
>>
>>>
>>> (That said, unrelated to this discussion, I would actually like to
>>> record per-segment which version of Lucene wrote the segment; this
>>> would be very helpful when debugging issues like LUCENE-1474 where I
>>> need to know if the segments were written by 2.4.0 or 2.4.1).
>>>
>>
>> That's a great idea, if for debugging only, and it shouldn't be
>> limited  to just the version that wrote the segment.  I could see a
>> debug section or file that could even contain more info if the right
>> flags are set.
>
> I would like to see this, too. In addition, I'd like to store what was used
> to create the index, that is the ordered chain of analyzers and filters on a
> per field basis.
>
> But whether it is baked into the index or a separate file, or not part of
> Lucene, I'm in the process of figuring out how/where to add it to my code.
>
> -- DM
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by DM Smith <dm...@gmail.com>.

Yonik Seeley wrote:
> On Fri, May 22, 2009 at 1:22 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>   
>> (That said, unrelated to this discussion, I would actually like to
>> record per-segment which version of Lucene wrote the segment; this
>> would be very helpful when debugging issues like LUCENE-1474 where I
>> need to know if the segments were written by 2.4.0 or 2.4.1).
>>     
>
> That's a great idea, if for debugging only, and it shouldn't be
> limited  to just the version that wrote the segment.  I could see a
> debug section or file that could even contain more info if the right
> flags are set.

I would like to see this, too. In addition, I'd like to store what was 
used to create the index, that is the ordered chain of analyzers and 
filters on a per field basis.

But whether it is baked into the index or a separate file, or not part 
of Lucene, I'm in the process of figuring out how/where to add it to my 
code.

-- DM

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Fri, May 22, 2009 at 1:22 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> (That said, unrelated to this discussion, I would actually like to
> record per-segment which version of Lucene wrote the segment; this
> would be very helpful when debugging issues like LUCENE-1474 where I
> need to know if the segments were written by 2.4.0 or 2.4.1).

That's a great idea, if for debugging only, and it shouldn't be
limited  to just the version that wrote the segment.  I could see a
debug section or file that could even contain more info if the right
flags are set.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

OK, net/net it doesn't look like we're going reach agreement on some
general approach for having users of Lucene always get the best
default settings.

We started with the *Settings classes, but that's really a very large
project (goes far beyond managing defaults for new users).

Then we went to the other end of the spectrum with a single global
actsAsVersion, but sneaky spooky action at a distance bugs nixed that.

We thought about storing actsAsVersion in the index, but that's too
automagic (and would also lead to sneaky bugs / confusion).

Finally we considered about passing in actsAsVersion to those classes
that need to change their defaults, but people would prefer Settings
over this.

Unless there are other ideas out there, I think at this point we
should just fallback to the "make a new method making the setting
explicit, and deprecate the old one" approach.  It achieves the goal,
on a case by case basis, without changing our back-compat policy nor
adding any new Settings/actsAsVersion infrastructure to Lucene.

Mike

On Fri, May 22, 2009 at 2:20 PM, DM Smith <dm...@gmail.com> wrote:
> Michael McCandless wrote:
>>
>> On Fri, May 22, 2009 at 12:52 PM, Marvin Humphrey
>> <ma...@rectangular.com> wrote:
>>
>>>>
>>>> when working on 3.1 if we make some great improvement, I'd like new
>>>> users in
>>>> 3.1 to see the improvement by default.
>>>>
>>>
>>> Sounds like an argument for more frequent major releases.
>>>
>>
>> Yeah.  Or "rebranding" what we now call minor as major releases, by
>> changing our policy ;) Or "rebranding" to Lucene 2009.
>>
>> But: localized improvements (like the sizable performance gain from
>> turning off scoring when sorting by field) should not have to wait for
>> a major release to benefit new users.  I think they should be on by
>> default on the next release.
>
> This proposed policy change of allowing backward compatibility in the API to
> change within a major release is nothing more than smoke and mirrors. But I
> see two side effects:
> 1) Debian, Fedora, and perhaps other Linux distributions, see minor releases
> as maintaining backward compatibility. With Debian, they bump their major
> revision number with each break in backward compatibility. I didn't check,
> but my guess is that the version name of Lucene in Debian corresponds with
> that of Lucene itself. I'd hate for that to change. How would you like to
> see Debian to name it Lucene 4 or Lucene 5, when we are doing Lucene 3.x. It
> gets confusing. (Real example:  libsword7, which corresponds to the 1.5.11
> release of SWORD and libsword8 corresponds to 1.6.0.)
>
> 2) Backward compatibility of the index is at least 2 major revisions and
> that is not proposed to change. Now with this, we effectively postpone it
> indefinitely. Rather than the index being allowed to change when the API has
> broken compatibility at most 2 times, with this proposed change, we can
> break API compatibility a dozen times. At the future point where this policy
> is brought into question, with something like "Now that we can break
> backward compatibility in the API frequently, we need to change our policy
> for the index to match", then we will have come full circle.
>
> At first, I liked the idea a lot, but now less so. Now I'm leaning toward
> changing major revision number when backward compatibility changes and for
> more frequent major releases if that is what it takes.
>
> This was the thrust of my tongue-in-cheek proposal of weekly minor and
> monthly major releases.
>
> I also share Marvin's and others' concerns about sneaky bugs introduced by
> globals. In my situation, Lucene is part of a desktop application and the
> user can create hundreds of indexes and use them within the application.
> With a *.deb or *.rpm, we'll have to specify that they cannot use anything
> but the minor release for which the application was designed. Before, we
> could say that one could drop in anything of the same major release number.
>
> I don't think I am alone or unique in embedding Lucene into a desktop
> application. I know it is a part of Eclipse (at least on Fedora).
>
> This change might have the opposite effect of making people's perception of
> Lucene as one of instability. Guard carefully against that, please!
>
> -- DM
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Earwin Burrfoot <ea...@gmail.com>.

> In KinoSearch SVN trunk, satellite classes like QueryParser and Highlighter
> have to be passed a Schema, which contains all the Analyzers.  Analyzers
> aren't satellite classes under this model -- they are a fixed property of a
> FullTextType field spec.  Think of them as baked into an SQL field definition.
>
> You can create a Schema from scratch to pass to the QueryParser, but it's
> easier to just get it from the Searcher.  Translating to Java...
>
>   Searcher searcher = new Searcher("/path/to/index");
>   QueryParser qparser = new QueryParser(searcher.getSchema());
>
> I don't see how that's so different from getting an analyzer actsAsVersion
> number from the index.
>
> Now, where stuff might start to get complicated is PerFieldAnalyzerWrapper...
> is that where the sneakiness gets overwhelming?
Some people can have setups more complex than that.
Different analyzers per field.
Custom analyzers.
Several indexes using the same analyzer.
Intentionally different analyzers for indexing and searching.
Using this analyzer without any index at all - like I do highlight on
a separate machine to minimize GC pauses, or tag docs by running a
heap of queries against MemoryIndex.

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by DM Smith <dm...@gmail.com>.

Michael McCandless wrote:
> On Fri, May 22, 2009 at 12:52 PM, Marvin Humphrey
> <ma...@rectangular.com> wrote:
>   
>>> when working on 3.1 if we make some great improvement, I'd like new users in
>>> 3.1 to see the improvement by default.
>>>       
>> Sounds like an argument for more frequent major releases.
>>     
>
> Yeah.  Or "rebranding" what we now call minor as major releases, by
> changing our policy ;) Or "rebranding" to Lucene 2009.
>
> But: localized improvements (like the sizable performance gain from
> turning off scoring when sorting by field) should not have to wait for
> a major release to benefit new users.  I think they should be on by
> default on the next release.

This proposed policy change of allowing backward compatibility in the 
API to change within a major release is nothing more than smoke and 
mirrors. But I see two side effects:
1) Debian, Fedora, and perhaps other Linux distributions, see minor 
releases as maintaining backward compatibility. With Debian, they bump 
their major revision number with each break in backward compatibility. I 
didn't check, but my guess is that the version name of Lucene in Debian 
corresponds with that of Lucene itself. I'd hate for that to change. How 
would you like to see Debian to name it Lucene 4 or Lucene 5, when we 
are doing Lucene 3.x. It gets confusing. (Real example:  libsword7, 
which corresponds to the 1.5.11 release of SWORD and libsword8 
corresponds to 1.6.0.)

2) Backward compatibility of the index is at least 2 major revisions and 
that is not proposed to change. Now with this, we effectively postpone 
it indefinitely. Rather than the index being allowed to change when the 
API has broken compatibility at most 2 times, with this proposed change, 
we can break API compatibility a dozen times. At the future point where 
this policy is brought into question, with something like "Now that we 
can break backward compatibility in the API frequently, we need to 
change our policy for the index to match", then we will have come full 
circle.

At first, I liked the idea a lot, but now less so. Now I'm leaning 
toward changing major revision number when backward compatibility 
changes and for more frequent major releases if that is what it takes.

This was the thrust of my tongue-in-cheek proposal of weekly minor and 
monthly major releases.

I also share Marvin's and others' concerns about sneaky bugs introduced 
by globals. In my situation, Lucene is part of a desktop application and 
the user can create hundreds of indexes and use them within the 
application. With a *.deb or *.rpm, we'll have to specify that they 
cannot use anything but the minor release for which the application was 
designed. Before, we could say that one could drop in anything of the 
same major release number.

I don't think I am alone or unique in embedding Lucene into a desktop 
application. I know it is a part of Eclipse (at least on Fedora).

This change might have the opposite effect of making people's perception 
of Lucene as one of instability. Guard carefully against that, please!

-- DM

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Fri, May 22, 2009 at 12:52 PM, Marvin Humphrey
<ma...@rectangular.com> wrote:
>
>> when working on 3.1 if we make some great improvement, I'd like new users in
>> 3.1 to see the improvement by default.
>
> Sounds like an argument for more frequent major releases.

Yeah.  Or "rebranding" what we now call minor as major releases, by
changing our policy ;) Or "rebranding" to Lucene 2009.

But: localized improvements (like the sizable performance gain from
turning off scoring when sorting by field) should not have to wait for
a major release to benefit new users.  I think they should be on by
default on the next release.

Will Lucy do scoring when sorting by field, by default?

>> On thinking about it more... automagically storing the "actsAsVersion"
>> in the index, and then having IndexWriter (for example) ask the
>> analyzer for a tokenStream matching that version, seems a little too
>> sneaky.
>
> Can you elaborate?
>
> In KinoSearch SVN trunk, satellite classes like QueryParser and Highlighter
> have to be passed a Schema, which contains all the Analyzers.  Analyzers
> aren't satellite classes under this model -- they are a fixed property of a
> FullTextType field spec.  Think of them as baked into an SQL field definition.
>
> You can create a Schema from scratch to pass to the QueryParser, but it's
> easier to just get it from the Searcher.  Translating to Java...
>
>   Searcher searcher = new Searcher("/path/to/index");
>   QueryParser qparser = new QueryParser(searcher.getSchema());
>
> I don't see how that's so different from getting an analyzer actsAsVersion
> number from the index.

I agree in KS/Lucy, it works well, because you must explicitly pass in
Schema to each of the satellite classes.

But in Lucene, if whenever IndexWriter asked analyzer for a
tokenstream, it passed in the actsAsVersion it had loaded from the
index, that's sneaky.  I'd rather have it explicit (like KS/Lucy), so
you'd have to IndexWrter.getActsAsVersion, then pass that into your
analyzer when you create it.  It's the automatic under-the-hood
passing that makes me nervous and I think would confuse users.

(That said, unrelated to this discussion, I would actually like to
record per-segment which version of Lucene wrote the segment; this
would be very helpful when debugging issues like LUCENE-1474 where I
need to know if the segments were written by 2.4.0 or 2.4.1).

> Now, where stuff might start to get complicated is PerFieldAnalyzerWrapper...
> is that where the sneakiness gets overwhelming?

Per-class actsAsVersion would work well here -- PFAW would just
forward the required version when requesting the tokenStream.

>> I prefer the up-front "you specify actsAsVersion" when you
>> create the analyzer, only for analyzers that have changed across
>> releases.  So things like WhitespaceAnalyzer would likely never need
>> an actsAsVersion arg.
>
> Hmm, this is kind of hard.  I'd prefer that the argument remain optional, so
> that new users don't have to think about it.

I wouldn't mind optional, but only if it defaults to latest and
greatest.  The goal here is to have new users always see the best of
Lucene when they start out.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Fri, May 22, 2009 at 11:33:33AM -0400, Michael McCandless wrote:

> when working on 3.1 if we make some great improvement, I'd like new users in
> 3.1 to see the improvement by default.  

Sounds like an argument for more frequent major releases.  But I'm not exactly
one to talk.  ;)

> On thinking about it more... automagically storing the "actsAsVersion"
> in the index, and then having IndexWriter (for example) ask the
> analyzer for a tokenStream matching that version, seems a little too
> sneaky.  

Can you elaborate?

In KinoSearch SVN trunk, satellite classes like QueryParser and Highlighter
have to be passed a Schema, which contains all the Analyzers.  Analyzers
aren't satellite classes under this model -- they are a fixed property of a
FullTextType field spec.  Think of them as baked into an SQL field definition.

You can create a Schema from scratch to pass to the QueryParser, but it's
easier to just get it from the Searcher.  Translating to Java... 

   Searcher searcher = new Searcher("/path/to/index");
   QueryParser qparser = new QueryParser(searcher.getSchema());

I don't see how that's so different from getting an analyzer actsAsVersion
number from the index.

Now, where stuff might start to get complicated is PerFieldAnalyzerWrapper...
is that where the sneakiness gets overwhelming?

> I prefer the up-front "you specify actsAsVersion" when you
> create the analyzer, only for analyzers that have changed across
> releases.  So things like WhitespaceAnalyzer would likely never need
> an actsAsVersion arg.

Hmm, this is kind of hard.  I'd prefer that the argument remain optional, so
that new users don't have to think about it.  But unlike in KS/Lucy, then
there's a danger of leaving it off inadvertently and getting the wrong
behavior. :\

Marvin Humphrey

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Thu, May 21, 2009 at 6:53 PM, Marvin Humphrey <ma...@rectangular.com> wrote:

> Lastly, I think a major java Lucene release is justified already.
> Won't this discussion die down somewhat if you can get 3.0 out?

Somewhat, yes, but then when working on 3.1 if we make some great
improvement, I'd like new users in 3.1 to see the improvement by
default.  So I'd like to resolve this for 2.9/3.0.

But I agree we gotta wrap 2.9 and move on.  It will feel great,
removing code & fixing defaults in 3.0!

> Full-on schema serialization isn't feasible for Lucene, but
> attaching an actsAsVersion variable to an index and feeding that to
> your analyzers would be a decent start.

On thinking about it more... automagically storing the "actsAsVersion"
in the index, and then having IndexWriter (for example) ask the
analyzer for a tokenStream matching that version, seems a little too
sneaky.  I prefer the up-front "you specify actsAsVersion" when you
create the analyzer, only for analyzers that have changed across
releases.  So things like WhitespaceAnalyzer would likely never need
an actsAsVersion arg.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Thu, May 21, 2009 at 05:19:43PM -0400, Michael McCandless wrote:

> Marvin, which solution would you prefer?

Between the two, I'd prefer settings constructor arguments, though I would be
inclined to have settings classes that are specific to individual classes
rather than Lucene-wide.  

At least that scheme gets locality right.  The global actsAsVersion variable
violates that principle and has the potential to saddle a small number of
users who have done absolutely nothing wrong with bugs that are very, very
hard to hunt down.  That's unfair.

As far as analyzers and token streams, the theoretical answer is making
indexes self-describing via serializable schemas, as discussed on the Lucy dev
list, and as implemented in KinoSearch svn trunk.  With versioning metadata
attached to the index, there is no longer any worry about upgrading analysis
modules provided that those modules handle their own versioning correctly.

For instance, in KS the Stopalizer always embeds the complete stoplist in the
schema file, so even if we update the "English" stoplist, we don't get invalid
search results for indexes which were created with the old stoplist.
Similarly, it may not be possible to keep around multiple variants of
Snowball, but at least we can fail catastrophically instead of subtly if we
detect that the Snowball version has changed.

Full-on schema serialization isn't feasible for Lucene, but attaching an
actsAsVersion variable to an index and feeding that to your analyzers would be
a decent start.

Lastly, I think a major java Lucene release is justified already.  Won't this
discussion die down somewhat if you can get 3.0 out?  If there are issues that
are half done, how about rolling back whatever's in the way?

Marvin Humphrey


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Thu, May 21, 2009 at 1:59 PM, Marvin Humphrey <ma...@rectangular.com> wrote:

> That bug has led to 'base' having a compromised reputation among elite users
> because of intermittent, inexplicable flakiness.  Is that what you want for
> Lucene?

While I agree a single global default is not great, I do think it's
the lesser of all evils here.

It really bothers me that our new users must wait so long (years) to
see improvements to our default settings, because we are so careful
about back-compat.

Marvin, which solution would you prefer?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Robert Muir <rc...@gmail.com>.

yeah, i was thinking the more likely case of where something like "teh" is
in the list...

On Thu, May 21, 2009 at 12:25 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Thu, May 21, 2009 at 12:19 PM, Robert Muir <rc...@gmail.com> wrote:
> > even as simple as changing default stopword list for some analyzer could
> be
> > an issue, if the user doesn't re-index in response to that change.
>
> OK, right.
>
> So say we forgot to include "the" in the default English stopwords
> list (yes, an extreme example...).
>
> Under the proposed changes 1 & 2 to back-compat policy, we would add
> "the" to the default stopword list, so new users get the fix, but
> still keep the the-less list accessible (deprecated).  We'd add an
> entry in CHANGES.txt saying this happened, and then show code on how
> to get back to the the-less stopword list.
>
> New users using that StopFilter would properly see "the" filtered out.
>  Users who upgraded would need to fix their code to switch back to the
> deprecated the-less list.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Robert Muir
rcmuir@gmail.com

Re: Lucene's default settings & back compatibility

Posted by DM Smith <dm...@gmail.com>.

Michael McCandless wrote:
> On Thu, May 21, 2009 at 12:19 PM, Robert Muir <rc...@gmail.com> wrote:
>   
>> even as simple as changing default stopword list for some analyzer could be
>> an issue, if the user doesn't re-index in response to that change.
>>     
>
> OK, right.
>
> So say we forgot to include "the" in the default English stopwords
> list (yes, an extreme example...).
>   
"The" would be a bug fix. I think most users would expect that to be 
fixed. They might be willing, as I would be, to require all their 
indexes using that stopword list to be rebuilt.

How about a change that would be a bit more controversial, to which some 
would agree and others would not.

I wonder how many people are creating metadata about indexes so that 
they can track when an index could/should/must be rebuilt? Some kind of 
"versioned tool chain info" for the index. If analyzers and filters can 
change output then it needs to be tracked.

-- DM

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Matthew Hall <mh...@informatics.jax.org>.

Sorry, I wasn't quite sure what to call this new class you guys have 
been talking about.

I was referring to the class that's being discussed to encapsulate all 
of the defaults for a given lucene release.  (Its caching strategies etc 
etc)

I'm just not certain that something like a static list of words belongs 
in a higher level defaults class like you guys are talking about, 
especially considering that anyone using a stop enabled analyzer really 
should be familiar with this list, and oftentimes needs to override it.

Meh, now that I'm actually typing it out though, perhaps I'm incorrect 
here, assuming this class you guys are describing will be well 
advertised/documented maybe it will actually make it easier for end 
developers to twiddle around with this list, or at least certainly make 
them more aware that its even something that they have the ability to 
actually change.

Matt

Michael McCandless wrote:
> What is the "lucene defaults class"?
>
> Mike
>
> On Thu, May 21, 2009 at 12:37 PM, Matthew Hall
> <mh...@informatics.jax.org> wrote:
>   
>> For extreme examples like this, couldn't the stopword list be encapsulated
>> into a single class that's used by the lucene defaults class.
>>
>> That way if you folks released updates to mostly static content like a
>> stopword list, new or old users could get it easily with a simple drop in
>> fix?
>>
>> Just my two cents.
>>
>> Matt
>>
>> Michael McCandless wrote:
>>     
>>> On Thu, May 21, 2009 at 12:19 PM, Robert Muir <rc...@gmail.com> wrote:
>>>
>>>       
>>>> even as simple as changing default stopword list for some analyzer could
>>>> be
>>>> an issue, if the user doesn't re-index in response to that change.
>>>>
>>>>         
>>> OK, right.
>>>
>>> So say we forgot to include "the" in the default English stopwords
>>> list (yes, an extreme example...).
>>>
>>> Under the proposed changes 1 & 2 to back-compat policy, we would add
>>> "the" to the default stopword list, so new users get the fix, but
>>> still keep the the-less list accessible (deprecated).  We'd add an
>>> entry in CHANGES.txt saying this happened, and then show code on how
>>> to get back to the the-less stopword list.
>>>
>>> New users using that StopFilter would properly see "the" filtered out.
>>>  Users who upgraded would need to fix their code to switch back to the
>>> deprecated the-less list.
>>>
>>> Mike
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>>       
>> --
>> Matthew Hall
>> Software Engineer
>> Mouse Genome Informatics
>> mhall@informatics.jax.org
>> (207) 288-6012
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>     
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>   


-- 
Matthew Hall
Software Engineer
Mouse Genome Informatics
mhall@informatics.jax.org
(207) 288-6012



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

What is the "lucene defaults class"?

Mike

On Thu, May 21, 2009 at 12:37 PM, Matthew Hall
<mh...@informatics.jax.org> wrote:
> For extreme examples like this, couldn't the stopword list be encapsulated
> into a single class that's used by the lucene defaults class.
>
> That way if you folks released updates to mostly static content like a
> stopword list, new or old users could get it easily with a simple drop in
> fix?
>
> Just my two cents.
>
> Matt
>
> Michael McCandless wrote:
>>
>> On Thu, May 21, 2009 at 12:19 PM, Robert Muir <rc...@gmail.com> wrote:
>>
>>>
>>> even as simple as changing default stopword list for some analyzer could
>>> be
>>> an issue, if the user doesn't re-index in response to that change.
>>>
>>
>> OK, right.
>>
>> So say we forgot to include "the" in the default English stopwords
>> list (yes, an extreme example...).
>>
>> Under the proposed changes 1 & 2 to back-compat policy, we would add
>> "the" to the default stopword list, so new users get the fix, but
>> still keep the the-less list accessible (deprecated).  We'd add an
>> entry in CHANGES.txt saying this happened, and then show code on how
>> to get back to the the-less stopword list.
>>
>> New users using that StopFilter would properly see "the" filtered out.
>>  Users who upgraded would need to fix their code to switch back to the
>> deprecated the-less list.
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
>
> --
> Matthew Hall
> Software Engineer
> Mouse Genome Informatics
> mhall@informatics.jax.org
> (207) 288-6012
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Matthew Hall <mh...@informatics.jax.org>.

For extreme examples like this, couldn't the stopword list be 
encapsulated into a single class that's used by the lucene defaults class.

That way if you folks released updates to mostly static content like a 
stopword list, new or old users could get it easily with a simple drop 
in fix?

Just my two cents.

Matt

Michael McCandless wrote:
> On Thu, May 21, 2009 at 12:19 PM, Robert Muir <rc...@gmail.com> wrote:
>   
>> even as simple as changing default stopword list for some analyzer could be
>> an issue, if the user doesn't re-index in response to that change.
>>     
>
> OK, right.
>
> So say we forgot to include "the" in the default English stopwords
> list (yes, an extreme example...).
>
> Under the proposed changes 1 & 2 to back-compat policy, we would add
> "the" to the default stopword list, so new users get the fix, but
> still keep the the-less list accessible (deprecated).  We'd add an
> entry in CHANGES.txt saying this happened, and then show code on how
> to get back to the the-less stopword list.
>
> New users using that StopFilter would properly see "the" filtered out.
>  Users who upgraded would need to fix their code to switch back to the
> deprecated the-less list.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>   


-- 
Matthew Hall
Software Engineer
Mouse Genome Informatics
mhall@informatics.jax.org
(207) 288-6012



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Marvin Humphrey <ma...@rectangular.com>.

Mike McCandless:

> Well this is what I love about the actsAsVersion solution.  There's no
> pain for our back-compat users (besides the one-time effort to set
> actsAsVersion), and new users always get the best settings.

When some mad-as-hell user complains to this list after spending an inordinate
amount of time chasing down an action-at-a-distance bug because of this
insidious and irresponsible OO design decision, I intend to follow up their
email with an I-told-you-so.

There's an action-at-a-distance bug in the Perl core module 'base.pm' that
bedeviled people for years before I finally cornered it.  Turns out it can't
be fixed, but at least now we know what's happening:

    http://rt.cpan.org/Public/Bug/Display.html?id=28799

    While this error does not occur frequently in the wild, when it does, the
    cost to the user is high because the debug path is obscure. I personally
    encountered it after failing to wrap a "use_ok" test in a BEGIN block;
    isolating it took me... longer than I would have liked. ;)

That bug has led to 'base' having a compromised reputation among elite users
because of intermittent, inexplicable flakiness.  Is that what you want for
Lucene?

Marvin Humphrey


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Thu, May 21, 2009 at 12:43 PM, Mark Miller <ma...@gmail.com> wrote:

> Hmmm - thats starting to sound nastier. Its another barrier to upgrading to
> a new jar. I have to monitor/hunt down and not miss all these little flags
> so that docs/terms don't disappear from my index? There is already some of
> that and I'd hate to see it grow. I'd rather have a stronger back compat
> policy here I think. Its just one thing after another stacking up to make
> upgrading a risk/pain at each jar update. I used to work on a project where
> I upgraded Lucene often, and it was because it was so easy to keep dropping
> in and picking new features as I wanted. We will really start pushing a
> heavy onus onto our users if we fully adopt 1 and 2. New users will benefit,
> but old users, unless they are Lucene hackers like you guys, will suffer.
>  Eventually our new users will be our old users.

Well this is what I love about the actsAsVersion solution.  There's no
pain for our back-compat users (besides the one-time effort to set
actsAsVersion), and new users always get the best settings.

Or... we could consider encoding "actsAsVersion" into the index by
default.  Then, when IndexWriter asks the Analyzer for a tokenStream,
it'd pass in the actsAsVersion, so that any tokenizers/filters in the
chain would preserve their behavior as of that Lucene version.  (You'd
have to be able to turn this off, too).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Mark Miller <ma...@gmail.com>.

Michael McCandless wrote:
> On Thu, May 21, 2009 at 12:19 PM, Robert Muir <rc...@gmail.com> wrote:
>   
>> even as simple as changing default stopword list for some analyzer could be
>> an issue, if the user doesn't re-index in response to that change.
>>     
>
> OK, right.
>
> So say we forgot to include "the" in the default English stopwords
> list (yes, an extreme example...).
>
> Under the proposed changes 1 & 2 to back-compat policy, we would add
> "the" to the default stopword list, so new users get the fix, but
> still keep the the-less list accessible (deprecated).  We'd add an
> entry in CHANGES.txt saying this happened, and then show code on how
> to get back to the the-less stopword list.
>
> New users using that StopFilter would properly see "the" filtered out.
>  Users who upgraded would need to fix their code to switch back to the
> deprecated the-less list.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>   
Hmmm - thats starting to sound nastier. Its another barrier to upgrading 
to a new jar. I have to monitor/hunt down and not miss all these little 
flags so that docs/terms don't disappear from my index? There is already 
some of that and I'd hate to see it grow. I'd rather have a stronger 
back compat policy here I think. Its just one thing after another 
stacking up to make upgrading a risk/pain at each jar update. I used to 
work on a project where I upgraded Lucene often, and it was because it 
was so easy to keep dropping in and picking new features as I wanted. We 
will really start pushing a heavy onus onto our users if we fully adopt 
1 and 2. New users will benefit, but old users, unless they are Lucene 
hackers like you guys, will suffer.  Eventually our new users will be 
our old users.

I'm fully on the fence. I think relaxing will help development, but 
Lucene's stability has also been a strong quality. It would be nice to 
see it remain in some form.

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Thu, May 21, 2009 at 12:19 PM, Robert Muir <rc...@gmail.com> wrote:
> even as simple as changing default stopword list for some analyzer could be
> an issue, if the user doesn't re-index in response to that change.

OK, right.

So say we forgot to include "the" in the default English stopwords
list (yes, an extreme example...).

Under the proposed changes 1 & 2 to back-compat policy, we would add
"the" to the default stopword list, so new users get the fix, but
still keep the the-less list accessible (deprecated).  We'd add an
entry in CHANGES.txt saying this happened, and then show code on how
to get back to the the-less stopword list.

New users using that StopFilter would properly see "the" filtered out.
 Users who upgraded would need to fix their code to switch back to the
deprecated the-less list.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Thu, May 21, 2009 at 12:46 PM, DM Smith <dm...@gmail.com> wrote:
> I'm looking forward to the repackaging effort.

I'm looking forward to it too!  I can't wait for NumericRangeQuery...

But: someone with serious ant skill set, and some time, needs to get
the itch here and start iterating...

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by DM Smith <dm...@gmail.com>.

Michael McCandless wrote:
> On Thu, May 21, 2009 at 8:24 AM, DM Smith <dm...@gmail.com> wrote:
>   
>> On May 21, 2009, at 7:17 AM, Michael McCandless wrote:
>>
>>     
>>>  1) Default settings can change; we will always choose defaults based
>>>    on "latest & greatest for new users".  This only affects "runtime
>>>    behavior".  EG in 2.9, when sorting by field you won't get scores
>>>    by default.  When we do this we should clearly document the
>>>    change, and what settings one could use to get back to the old
>>>    behavior, in CHANGES.txt.
>>>       
>> I'd reverse 1 and 2 and note in 1 that the old behavior might be deprecated.
>>     
>
> OK.
>
>   
>>>  2) An API, once released as deprecated, is fair game to be removed
>>>    in the next minor release.
>>>       
>> I presume you mean that it will be present for at least one full minor
>> release. So, if at 3.1.5 a deprecation is introduced, then it won't be
>> removed until 3.3 at the earliest, because 3.2 was the first minor release
>> in which it appeared at the start. I don't think it is fair to expect users
>> to get every last point release.
>>     
>
> Right.
>
>   
>>> We still only make bug fixes on point releases, support the index file
>>> format until the next major release -- those don't change.
>>>       
>> Is it just the index file format? I would hope that the behavior of filters,
>> analyzers and such would not change so as to invalidate an index.
>>     
>
> Can you give an example of such changes?  EG if we fix a bug in
> StandardAnalyzer, we will default it to fixed for new users and expect
> you on upgrading to read CHANGES.txt and change your app to set that
> setting to its non-defaulted value.
>   
I guess I'm not too concerned with bug fixes. I'm kind of a nut when it 
comes to correctness. But, I'd want to know that such a bug broke strict 
backward compatibility. I guess I don't want backward compatibility to 
get too much in the way of fixing bugs. (I think sometimes it has.) I 
wouldn't expect a compatibility flag to preserve buggy behavior. I guess 
I'm willing to go to extra effort to work with bug fixes. But I wouldn't 
expect others to feel the same way.

Off the top of my head, in addition to Robert's stop word list, let's 
say that the filter that strips accents (I can't remember the name) is 
changed to be more than Latin-1 to ASCII folding. That would invalidate 
existing indexes.

Or a new and improved filter is created to replace a class I use and the 
old class is deprecated. If that old class goes away, my index is 
invalidated.

So if the stream of tokens out of an analyzer changes or the results of 
a filter is different, an index built with them is invalidated. If the 
output remains the same, I shouldn't care what has changed internally 
and probably don't care if the API has changed.

I don't know if it matters to this discussion, but there's a lot in 
contrib that people (of which I am one :) expect to be stable. I'm 
looking forward to the repackaging effort.

-- DM



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Robert Muir <rc...@gmail.com>.

even as simple as changing default stopword list for some analyzer could be
an issue, if the user doesn't re-index in response to that change.


> Can you give an example of such changes?  EG if we fix a bug in
> StandardAnalyzer, we will default it to fixed for new users and expect
> you on upgrading to read CHANGES.txt and change your app to set that
> setting to its non-defaulted value.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Robert Muir
rcmuir@gmail.com

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Thu, May 21, 2009 at 8:24 AM, DM Smith <dm...@gmail.com> wrote:
>
> On May 21, 2009, at 7:17 AM, Michael McCandless wrote:
>
>>  1) Default settings can change; we will always choose defaults based
>>    on "latest & greatest for new users".  This only affects "runtime
>>    behavior".  EG in 2.9, when sorting by field you won't get scores
>>    by default.  When we do this we should clearly document the
>>    change, and what settings one could use to get back to the old
>>    behavior, in CHANGES.txt.
>
> I'd reverse 1 and 2 and note in 1 that the old behavior might be deprecated.

OK.

>>  2) An API, once released as deprecated, is fair game to be removed
>>    in the next minor release.
>
> I presume you mean that it will be present for at least one full minor
> release. So, if at 3.1.5 a deprecation is introduced, then it won't be
> removed until 3.3 at the earliest, because 3.2 was the first minor release
> in which it appeared at the start. I don't think it is fair to expect users
> to get every last point release.

Right.

>> We still only make bug fixes on point releases, support the index file
>> format until the next major release -- those don't change.
>
> Is it just the index file format? I would hope that the behavior of filters,
> analyzers and such would not change so as to invalidate an index.

Can you give an example of such changes?  EG if we fix a bug in
StandardAnalyzer, we will default it to fixed for new users and expect
you on upgrading to read CHANGES.txt and change your app to set that
setting to its non-defaulted value.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by DM Smith <dm...@gmail.com>.

On May 21, 2009, at 7:17 AM, Michael McCandless wrote:

>  1) Default settings can change; we will always choose defaults based
>     on "latest & greatest for new users".  This only affects "runtime
>     behavior".  EG in 2.9, when sorting by field you won't get scores
>     by default.  When we do this we should clearly document the
>     change, and what settings one could use to get back to the old
>     behavior, in CHANGES.txt.

I'd reverse 1 and 2 and note in 1 that the old behavior might be  
deprecated.

>
>  2) An API, once released as deprecated, is fair game to be removed
>     in the next minor release.

I presume you mean that it will be present for at least one full minor  
release. So, if at 3.1.5 a deprecation is introduced, then it won't be  
removed until 3.3 at the earliest, because 3.2 was the first minor  
release in which it appeared at the start. I don't think it is fair to  
expect users to get every last point release.

If so +1 from a user.

>
> We still only make bug fixes on point releases, support the index file
> format until the next major release -- those don't change.

Is it just the index file format? I would hope that the behavior of  
filters, analyzers and such would not change so as to invalidate an  
index.

-- DM

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

OK so it sounds like we've boiled the proposal down to two concrete
changes to the back-compat policy:

  1) Default settings can change; we will always choose defaults based
     on "latest & greatest for new users".  This only affects "runtime
     behavior".  EG in 2.9, when sorting by field you won't get scores
     by default.  When we do this we should clearly document the
     change, and what settings one could use to get back to the old
     behavior, in CHANGES.txt.

  2) An API, once released as deprecated, is fair game to be removed
     in the next minor release.

We still only make bug fixes on point releases, support the index file
format until the next major release -- those don't change.

Mike

On Wed, May 20, 2009 at 11:34 PM, Shai Erera <se...@gmail.com> wrote:
>> With the new way, you can get the first bug fix release, but then you will
>> quickly be left out of new bug fixes until you update your code.
>
> Mark, apologies for the late reference, but it struck me only after I left
> the computer yesterday. Again, I'm not sure how bit of a problem is it.
> Suppose that I upgrade to 2.4 and the next version (6 months?) is 2.9. Then
> a bug is discovered and is fixed in 2.4.1 and 2.9. In addition, 2.9 contains
> all these changes in Collectors. When 2.9 is out I decide not to upgrade to
> 2.9 because I don't have time. When 3.0 comes out (3-4 months later?) I will
> be forced to upgrade. That means ~1 year since I last upgraded my Lucene
> code sections.
> (True, if there will be any bug fixes in 2.9, I will miss them).
>
> How unreasonable is to ask this? Seriously, how many apps are not touched
> *at all* during one year? And even if these code segments are stable and no
> one touches them anymore, how problematic is it to request users to once a
> year do a sort of cleanup and update to their code?
>
>> In other words, we keep deprecated around for only one or two versions.
>
> That is a reasonable approach. Meaning, defaults may change between releases
> because we'd like Lucene users to get the latest & greatest stuff, (and also
> count on the fact not so many out there strongly rely on the defaults?) but
> methods removal/rename should cause a little more trouble, so we can give
> the users one release to arrange their time before they have to do anything.
>
> Maybe the TokenStream API needs to stay deprecated for longer, until we're
> sure everybody is happy with the new API.
>
> Shai
>
> On Thu, May 21, 2009 at 4:23 AM, Grant Ingersoll <gs...@apache.org>
> wrote:
>>
>> On May 20, 2009, at 4:06 PM, Michael McCandless wrote:
>>
>>> On Wed, May 20, 2009 at 3:24 PM, Shai Erera <se...@gmail.com> wrote:
>>>>
>>>> Then why go through all this trouble and not simply change the
>>>> back-compat
>>>> policy?
>>>
>>> Back-compat is insanely costly, especially the longer it takes us to
>>> get to the next major release...  yet, the specific cost that bothers
>>> me the most is that we hurt our new users because of the back-compat
>>> users.  It hurts Lucene's adoption/growth.
>>>
>>> Another consideration on relaxing policy is that back-compat is well
>>> nigh impossible to actually achieve.  We spend an insane amount of our
>>> energy maintaining back-compat, but then one accidental breakage that
>>> slips through quickly causes many back-compat users to conclude we are
>>> not back-compat.  It's not much bang and alot of buck.
>>>
>>> It is tempting to change our policy to something like:
>>>
>>>  * Bug fixes only on each 2.4.X release
>>>
>>>  * Anything can change on each 2.X release, but any prior 2.Y index
>>>   format is readable
>>>
>>> I think it's not unreasonable to say "if you want to take advantage of
>>> Lucene's perf improvements and new features, on upgrading you'll have
>>> to recompile, fix APIs, etc.".
>>
>>
>> All reasonable, Mike.  My take is that Lucene has always been pragmatic
>> about darn near everything, except back compat, where we are pretty
>> dogmatic.
>>
>> In general, I think it is reasonable to say that even from 2.x to 2.y we
>> will try to be back compatible, but when we deem it necessary, we reserve
>> the right to change things.  I don't think anyone here is suggesting we
>> would ever do something drastic like a complete overhaul of all the APIs in
>> a version change.  I also think it is reasonable to deprecate things by
>> saying @deprecated Will be removed in 2.Y.  Use coolNewMethod instead.   In
>> other words, we keep deprecated around for only one or two versions.  Of
>> course, the timing can vary.  Things like changing the Document stuff like
>> we've talked about might last longer (or shorter, actually) while minor
>> deprecations may only be kept for one.  The index compatibility stuff is a
>> must.
>>
>> It is probably worthwhile to ask on java-user@ how many people rely on our
>> back compat policies.
>>
>> <tongue in cheek> Of course, we do already support back compat for all
>> versions:  svn checkout
>> http://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_3_1/  </tongue in
>> cheek>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Shai Erera <se...@gmail.com>.

>
> With the new way, you can get the first bug fix release, but then you will
> quickly be left out of new bug fixes until you update your code.

Mark, apologies for the late reference, but it struck me only after I left
the computer yesterday. Again, I'm not sure how bit of a problem is it.
Suppose that I upgrade to 2.4 and the next version (6 months?) is 2.9. Then
a bug is discovered and is fixed in 2.4.1 and 2.9. In addition, 2.9 contains
all these changes in Collectors. When 2.9 is out I decide not to upgrade to
2.9 because I don't have time. When 3.0 comes out (3-4 months later?) I will
be forced to upgrade. That means ~1 year since I last upgraded my Lucene
code sections.
(True, if there will be any bug fixes in 2.9, I will miss them).

How unreasonable is to ask this? Seriously, how many apps are not touched
*at all* during one year? And even if these code segments are stable and no
one touches them anymore, how problematic is it to request users to once a
year do a sort of cleanup and update to their code?

In other words, we keep deprecated around for only one or two versions.

That is a reasonable approach. Meaning, defaults may change between releases
because we'd like Lucene users to get the latest & greatest stuff, (and also
count on the fact not so many out there strongly rely on the defaults?) but
methods removal/rename should cause a little more trouble, so we can give
the users one release to arrange their time before they have to do anything.

Maybe the TokenStream API needs to stay deprecated for longer, until we're
sure everybody is happy with the new API.

Shai

On Thu, May 21, 2009 at 4:23 AM, Grant Ingersoll <gs...@apache.org>wrote:

>
> On May 20, 2009, at 4:06 PM, Michael McCandless wrote:
>
>  On Wed, May 20, 2009 at 3:24 PM, Shai Erera <se...@gmail.com> wrote:
>>
>>> Then why go through all this trouble and not simply change the
>>> back-compat
>>> policy?
>>>
>>
>> Back-compat is insanely costly, especially the longer it takes us to
>> get to the next major release...  yet, the specific cost that bothers
>> me the most is that we hurt our new users because of the back-compat
>> users.  It hurts Lucene's adoption/growth.
>>
>> Another consideration on relaxing policy is that back-compat is well
>> nigh impossible to actually achieve.  We spend an insane amount of our
>> energy maintaining back-compat, but then one accidental breakage that
>> slips through quickly causes many back-compat users to conclude we are
>> not back-compat.  It's not much bang and alot of buck.
>>
>> It is tempting to change our policy to something like:
>>
>>  * Bug fixes only on each 2.4.X release
>>
>>  * Anything can change on each 2.X release, but any prior 2.Y index
>>   format is readable
>>
>> I think it's not unreasonable to say "if you want to take advantage of
>> Lucene's perf improvements and new features, on upgrading you'll have
>> to recompile, fix APIs, etc.".
>>
>
>
> All reasonable, Mike.  My take is that Lucene has always been pragmatic
> about darn near everything, except back compat, where we are pretty
> dogmatic.
>
> In general, I think it is reasonable to say that even from 2.x to 2.y we
> will try to be back compatible, but when we deem it necessary, we reserve
> the right to change things.  I don't think anyone here is suggesting we
> would ever do something drastic like a complete overhaul of all the APIs in
> a version change.  I also think it is reasonable to deprecate things by
> saying @deprecated Will be removed in 2.Y.  Use coolNewMethod instead.   In
> other words, we keep deprecated around for only one or two versions.  Of
> course, the timing can vary.  Things like changing the Document stuff like
> we've talked about might last longer (or shorter, actually) while minor
> deprecations may only be kept for one.  The index compatibility stuff is a
> must.
>
> It is probably worthwhile to ask on java-user@ how many people rely on our
> back compat policies.
>
> <tongue in cheek> Of course, we do already support back compat for all
> versions:  svn checkout
> http://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_3_1/  </tongue
> in cheek>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: Lucene's default settings & back compatibility

Posted by Grant Ingersoll <gs...@apache.org>.

On May 20, 2009, at 4:06 PM, Michael McCandless wrote:

> On Wed, May 20, 2009 at 3:24 PM, Shai Erera <se...@gmail.com> wrote:
>> Then why go through all this trouble and not simply change the back- 
>> compat
>> policy?
>
> Back-compat is insanely costly, especially the longer it takes us to
> get to the next major release...  yet, the specific cost that bothers
> me the most is that we hurt our new users because of the back-compat
> users.  It hurts Lucene's adoption/growth.
>
> Another consideration on relaxing policy is that back-compat is well
> nigh impossible to actually achieve.  We spend an insane amount of our
> energy maintaining back-compat, but then one accidental breakage that
> slips through quickly causes many back-compat users to conclude we are
> not back-compat.  It's not much bang and alot of buck.
>
> It is tempting to change our policy to something like:
>
>  * Bug fixes only on each 2.4.X release
>
>  * Anything can change on each 2.X release, but any prior 2.Y index
>    format is readable
>
> I think it's not unreasonable to say "if you want to take advantage of
> Lucene's perf improvements and new features, on upgrading you'll have
> to recompile, fix APIs, etc.".

All reasonable, Mike.  My take is that Lucene has always been  
pragmatic about darn near everything, except back compat, where we are  
pretty dogmatic.

In general, I think it is reasonable to say that even from 2.x to 2.y  
we will try to be back compatible, but when we deem it necessary, we  
reserve the right to change things.  I don't think anyone here is  
suggesting we would ever do something drastic like a complete overhaul  
of all the APIs in a version change.  I also think it is reasonable to  
deprecate things by saying @deprecated Will be removed in 2.Y.  Use  
coolNewMethod instead.   In other words, we keep deprecated around for  
only one or two versions.  Of course, the timing can vary.  Things  
like changing the Document stuff like we've talked about might last  
longer (or shorter, actually) while minor deprecations may only be  
kept for one.  The index compatibility stuff is a must.

It is probably worthwhile to ask on java-user@ how many people rely on  
our back compat policies.

<tongue in cheek> Of course, we do already support back compat for all  
versions:  svn checkout http://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_3_1/ 
   </tongue in cheek>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Wed, May 20, 2009 at 3:24 PM, Shai Erera <se...@gmail.com> wrote:
> Then why go through all this trouble and not simply change the back-compat
> policy?

OK so let's talk policy now ;)

We need some serious relaxing of the back-compat policy to make the
actsAsVersion proposal pointless.

Ie whenever we want to change a default, eg sorting by field should
not compute scores, IndexWriter should suddenly default autoCommit to
false, IndexReader.open gives you a readOnly reader, MultiTermQuery is
constant score by default (once we fix BQ to do constant score), docs
are scored out-of-order by BQ, stop filter preserves positions, etc.,
we need to be "allowed" (by our policy) make such changes in the next
dot release.

I want new users on every dot-release to always get the
latest&greatest defaults.  Every change we make needs to be free to
adopt the best defaults.

If we relax our policy enough so that we have full freedom to set
defaults only according to new users, then I agree actsAsVersion is
not needed.

Back-compat is insanely costly, especially the longer it takes us to
get to the next major release...  yet, the specific cost that bothers
me the most is that we hurt our new users because of the back-compat
users.  It hurts Lucene's adoption/growth.

Another consideration on relaxing policy is that back-compat is well
nigh impossible to actually achieve.  We spend an insane amount of our
energy maintaining back-compat, but then one accidental breakage that
slips through quickly causes many back-compat users to conclude we are
not back-compat.  It's not much bang and alot of buck.

It is tempting to change our policy to something like:

  * Bug fixes only on each 2.4.X release

  * Anything can change on each 2.X release, but any prior 2.Y index
    format is readable

I think it's not unreasonable to say "if you want to take advantage of
Lucene's perf improvements and new features, on upgrading you'll have
to recompile, fix APIs, etc.".

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Shai Erera <se...@gmail.com>.

Then why go through all this trouble and not simply change the back-compat
policy?

Really, I read some of Grant's responses and I realize that I've upgraded to
2.4 way too long ago. 2.9 is nowhere in sight. It takes a lot of time to
release and during that time there's lots of discussions on the mailing
list, lots of issues and so on. What I'm trying to say is that with the
amount of communication on this mailing list, people have a lot of
opportunities to pick up changes, in addition to the CHANGES file.

In 2.9 we're breaking back-compat, with those "Changes in backward
compatibility" section in CHANGES. So that makes it 2.4 and 2.9 in a row
where back-compat was not delivered as promised.

And how radical is it to ask people to update their code when they upgrade?
Yes, if we were releasing every month, like was suggested previously, I can
understand why it's important. But we're not. So changing my code every 6-9
months is not that bad. Most chances I'll change my code because of other
things, not just Lucene.

To me, all this Settings class (or actsAsVersion) will only complicate
things. If I understand correctly, than in 2.9 we'll have the code
defaulting to "actAs29", with the ability to change it to "actAs24". Doesn't
that mean I need to update my code if I want to retain 2.4 behavior? If I
already touch my code, how complicated is it to really match my app to 2.9?
I mean, how many people write Collectors, and among those - how many
Collectors do they write? We've gone through a hell lot of discussions in
1575 just to protect those who still use HitCollector, but I'm not sure how
many users we actually protected.

First, I think we should seriously consider to drop the "jar drop-in
ability" requirement. I don't see any benefits from doing that, except for
bug fixes. Second, usually the changes in runtime behavior is for improving
things (such as performance) - so I don't see why we can't ask someone
upgrading to a newer version to take advantage of those improvements.

Grant suggested we discuss the back-compat policy, since if we resolve that
we might not need Settings or actAs solution. I agree with that proposal. If
we can relax our back-compat policy to the point of just the index structure
(since between us, that's the most expensive thing you can hit when
upgrading a Lucene version) then I don't think we need these Settings/actAs
approaches.

And BTW, the code today is already packed with deprecated methods, which
neither Settings nor actAs will solve. So even by adopting new defaults,
we'll still have troubles with back-compat, since we'll need to deprecate
methods/classes and worse - find alternative names !

We could also decide to have X.0, X.5 and X+1.0 as point releases where
back-compat changes (removing deprecated methods and changing defaults).
That way we'll keep everybody happy, w/o needing to add Settings/actAs or
wait 1-2 years before a major release is out.

Shai

On Wed, May 20, 2009 at 10:10 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Wed, May 20, 2009 at 12:55 PM, Andi Vajda <va...@osafoundation.org>
> wrote:
>
> > I've been watching this thread with interest with my opinion swaying back
> > and forth.
>
> So have I!
>
> > This last comment, though, pushes me to favor the settings class idea
> > because that idea came with the promise of eliminating the combinatorial
> > explosion of contructor and method overloads.
> >
> > In addition, I very much like the idea of having one place list all the
> > coherent configuration choices one can make. No, CHANGES.txt is not it.
> > While it's interesting reading, it reads like a blog. It doesn't tie
> > sensible settings together. It only gives a differential and
> chronological
> > view of changes.
> >
> > Having version-specific settings classes is a really neat place to list
> all
> > possible settings in one place with sensible and coherent values for a
> > version.
>
> The thing is... the number of settings will be large over time, and so
> we'll need a hierarchy of classes, or we fallback to Properties w/ the
> hierarchy encoded in the string, but then you have a weakly typed API,
> and you lose the self-documenting (like Grant observed).
>
> Ie, in theory I love the idea of Settings, but in practice, as I start
> to think about the realities of implementing it, I realize it's gonna
> be a big challenge to solve it well.  This goes waaay beyond resolving
> the back-compat vs new users conflict we have today.
>
> Pushing to the way future, I'm also not convinced it's great that I
> have to go to two places (IndexWriter and its *Settings counterpart)
> to manage my "IndexWriter".
>
> I think the idea can work, but I'm realizing it's a huuuge project (vs
> actsAsVersion which is quite simple).
>
> > The same idea could be used for other things than version by the
> > way. It could help in picking one side of a configuration trade off over
> > another.
> >
> > For example:
> >   - a settings for favoring speed of updates over speed of queries if
> that
> >     makes sense
> >   - a settings for favoring index size over indexing speed
> >   ... and so on.
>
> Right -- Solr is discussing this now, too.  I think this would be
> useful.
>
> > I don't see why this has to be limited just to Lucene version backwards
> > compatibility.
>
> I think we should do "actsAsVersion" today, solely to resolve the
> back-compat vs new users conflict, and continue to explore/discuss
> Settings for these other reasons.
>
> > Oh, and about that: I think we've reached the breaking point
> > about backwards compatibility support a while ago. I recently hit a bug
> in
> > my code where a commit() call was missing. Before 2.4, flushing the index
> > committed it. Starting with 2.4, this is no longer the case. Yes, this is
> > documented and that helped me fix the bug really quickly but backwards
> > compatible it is not.
>
> Hmm -- I think we should have had flush() just call commit().
>
> > My point here is that we've promised too much
> > backwards compatibility for too long and it's been getting too hard to
> > deliver that promise now.
>
> I think it's high time we release 3.0 then!
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Wed, May 20, 2009 at 12:55 PM, Andi Vajda <va...@osafoundation.org> wrote:

> I've been watching this thread with interest with my opinion swaying back
> and forth.

So have I!

> This last comment, though, pushes me to favor the settings class idea
> because that idea came with the promise of eliminating the combinatorial
> explosion of contructor and method overloads.
>
> In addition, I very much like the idea of having one place list all the
> coherent configuration choices one can make. No, CHANGES.txt is not it.
> While it's interesting reading, it reads like a blog. It doesn't tie
> sensible settings together. It only gives a differential and chronological
> view of changes.
>
> Having version-specific settings classes is a really neat place to list all
> possible settings in one place with sensible and coherent values for a
> version.

The thing is... the number of settings will be large over time, and so
we'll need a hierarchy of classes, or we fallback to Properties w/ the
hierarchy encoded in the string, but then you have a weakly typed API,
and you lose the self-documenting (like Grant observed).

Ie, in theory I love the idea of Settings, but in practice, as I start
to think about the realities of implementing it, I realize it's gonna
be a big challenge to solve it well.  This goes waaay beyond resolving
the back-compat vs new users conflict we have today.

Pushing to the way future, I'm also not convinced it's great that I
have to go to two places (IndexWriter and its *Settings counterpart)
to manage my "IndexWriter".

I think the idea can work, but I'm realizing it's a huuuge project (vs
actsAsVersion which is quite simple).

> The same idea could be used for other things than version by the
> way. It could help in picking one side of a configuration trade off over
> another.
>
> For example:
>   - a settings for favoring speed of updates over speed of queries if that
>     makes sense
>   - a settings for favoring index size over indexing speed
>   ... and so on.

Right -- Solr is discussing this now, too.  I think this would be
useful.

> I don't see why this has to be limited just to Lucene version backwards
> compatibility.

I think we should do "actsAsVersion" today, solely to resolve the
back-compat vs new users conflict, and continue to explore/discuss
Settings for these other reasons.

> Oh, and about that: I think we've reached the breaking point
> about backwards compatibility support a while ago. I recently hit a bug in
> my code where a commit() call was missing. Before 2.4, flushing the index
> committed it. Starting with 2.4, this is no longer the case. Yes, this is
> documented and that helped me fix the bug really quickly but backwards
> compatible it is not.

Hmm -- I think we should have had flush() just call commit().

> My point here is that we've promised too much
> backwards compatibility for too long and it's been getting too hard to
> deliver that promise now.

I think it's high time we release 3.0 then!

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Andi Vajda <va...@osafoundation.org>.

On Wed, 20 May 2009, Michael McCandless wrote:

> On Wed, May 20, 2009 at 11:57 AM, Yonik Seeley
> <yo...@lucidimagination.com> wrote:
>> On Wed, May 20, 2009 at 11:46 AM, Mark Miller <ma...@gmail.com> wrote:
>>> Marvin Humphrey wrote:
>>>>
>>>> Yeesh, that's evil.  :(
>>>>
>>>> It will be sweet, sweet justice if one of your own projects gets infected
>>>> by
>>>> the kind of action-at-a-distance bug you're so blithely unconcerned about
>>>
>>> Heh. Thats a bit over the top. It is evil stuff, but its much less evil in
>>> this very contained instance than the general case. Much less.
>>>
>>> But still a bit evil with the potential to grow. I'm not anymore of a fan of
>>> passing a config to each class though. But I guess from a design point
>>> of view, it does feel a little less evil.
>>
>> Agree.
>>
>> But passing settings around doesn't solve the problem.  Example:  New
>> settings may be chosen by an application for an IndexSearcher that's
>> incompatible with a custom older Query/Weight/Scorer.  There's really
>> no getting around that problem.  I think the static helps solve
>> drop-in compat for a complete working application.  Good components
>> should only be checking the static, not setting it.
>
> Also, this static setting simply tells Lucene how to default settings.
>
> A component/app can still be explicit when creating classes.  EG when
> opening an IndexReader, if one always passes in the readOnly arg then
> the static "actsAsVersion" would not be used.

I've been watching this thread with interest with my opinion swaying back 
and forth.

This last comment, though, pushes me to favor the settings class idea 
because that idea came with the promise of eliminating the combinatorial 
explosion of contructor and method overloads.

In addition, I very much like the idea of having one place list all the 
coherent configuration choices one can make. No, CHANGES.txt is not it. 
While it's interesting reading, it reads like a blog. It doesn't tie 
sensible settings together. It only gives a differential and chronological 
view of changes.

Having version-specific settings classes is a really neat place to list all 
possible settings in one place with sensible and coherent values for a 
version. The same idea could be used for other things than version by the 
way. It could help in picking one side of a configuration trade off over 
another.

For example:
    - a settings for favoring speed of updates over speed of queries if that
      makes sense
    - a settings for favoring index size over indexing speed
    ... and so on.

I don't see why this has to be limited just to Lucene version backwards
compatibility. Oh, and about that: I think we've reached the breaking point 
about backwards compatibility support a while ago. I recently hit a bug in 
my code where a commit() call was missing. Before 2.4, flushing the index 
committed it. Starting with 2.4, this is no longer the case. Yes, this is 
documented and that helped me fix the bug really quickly but backwards 
compatible it is not. My point here is that we've promised too much 
backwards compatibility for too long and it's been getting too hard to 
deliver that promise now.

Andi..

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Wed, May 20, 2009 at 11:57 AM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Wed, May 20, 2009 at 11:46 AM, Mark Miller <ma...@gmail.com> wrote:
>> Marvin Humphrey wrote:
>>>
>>> Yeesh, that's evil.  :(
>>>
>>> It will be sweet, sweet justice if one of your own projects gets infected
>>> by
>>> the kind of action-at-a-distance bug you're so blithely unconcerned about
>>
>> Heh. Thats a bit over the top. It is evil stuff, but its much less evil in
>> this very contained instance than the general case. Much less.
>>
>> But still a bit evil with the potential to grow. I'm not anymore of a fan of
>> passing a config to each class though. But I guess from a design point
>> of view, it does feel a little less evil.
>
> Agree.
>
> But passing settings around doesn't solve the problem.  Example:  New
> settings may be chosen by an application for an IndexSearcher that's
> incompatible with a custom older Query/Weight/Scorer.  There's really
> no getting around that problem.  I think the static helps solve
> drop-in compat for a complete working application.  Good components
> should only be checking the static, not setting it.

Also, this static setting simply tells Lucene how to default settings.

A component/app can still be explicit when creating classes.  EG when
opening an IndexReader, if one always passes in the readOnly arg then
the static "actsAsVersion" would not be used.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Wed, May 20, 2009 at 11:46 AM, Mark Miller <ma...@gmail.com> wrote:
> Marvin Humphrey wrote:
>>
>> Yeesh, that's evil.  :(
>>
>> It will be sweet, sweet justice if one of your own projects gets infected
>> by
>> the kind of action-at-a-distance bug you're so blithely unconcerned about
>
> Heh. Thats a bit over the top. It is evil stuff, but its much less evil in
> this very contained instance than the general case. Much less.
>
> But still a bit evil with the potential to grow. I'm not anymore of a fan of
> passing a config to each class though. But I guess from a design point
> of view, it does feel a little less evil.

Agree.

But passing settings around doesn't solve the problem.  Example:  New
settings may be chosen by an application for an IndexSearcher that's
incompatible with a custom older Query/Weight/Scorer.  There's really
no getting around that problem.  I think the static helps solve
drop-in compat for a complete working application.  Good components
should only be checking the static, not setting it.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Mark Miller <ma...@gmail.com>.

Marvin Humphrey wrote:
> Yeesh, that's evil.  :(
>
> It will be sweet, sweet justice if one of your own projects gets infected by
> the kind of action-at-a-distance bug you're so blithely unconcerned about
Heh. Thats a bit over the top. It is evil stuff, but its much less evil 
in this very contained instance than the general case. Much less.

But still a bit evil with the potential to grow. I'm not anymore of a 
fan of passing a config to each class though. But I guess from a design 
point
of view, it does feel a little less evil.

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Wed, May 20, 2009 at 05:57:49PM +0400, Earwin Burrfoot wrote:

> > What happens when two libraries loaded in the same VM have Lucene as a
> > dependency and set actsAsVersion to conflicting numbers?

> Exactly what happens when you call BooleanQuery.setMaxClauseCount(n)
> from two libraries.
> Last one wins.

Yeesh, that's evil.  :(

It will be sweet, sweet justice if one of your own projects gets infected by
the kind of action-at-a-distance bug you're so blithely unconcerned about.

http://en.wikipedia.org/wiki/Action_at_a_distance_(computer_science)

That was supposed to be a rhetorical question.  To be clear, I consider the
idea of a settable global variable determining library behavior completely
unacceptable.  Changing class load order somewhere in your code shouldn't do
things like change search results (because Stopfilters are applied differently
depending on who "won").

Marvin Humphrey

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Earwin Burrfoot <ea...@gmail.com>.

Exactly what happens when you call BooleanQuery.setMaxClauseCount(n)
from two libraries.
Last one wins.

On Wed, May 20, 2009 at 17:50, Marvin Humphrey <ma...@rectangular.com> wrote:
>> But since 3.0 is a major release anyway, we could change the default
>> of actsAsVersion with each 3.x release (or just set it to 39999) and
>> require that a users set actsAsVersion=30000 (or whatever version they
>> are on) in order to get maximum back compatibility.
>
> What happens when two libraries loaded in the same VM have Lucene as a
> dependency and set actsAsVersion to conflicting numbers?
>
> Marvin Humphrey
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Marvin Humphrey <ma...@rectangular.com>.

> But since 3.0 is a major release anyway, we could change the default
> of actsAsVersion with each 3.x release (or just set it to 39999) and
> require that a users set actsAsVersion=30000 (or whatever version they
> are on) in order to get maximum back compatibility.

What happens when two libraries loaded in the same VM have Lucene as a
dependency and set actsAsVersion to conflicting numbers?

Marvin Humphrey


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Wed, May 20, 2009 at 7:22 AM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> So I think you're suggesting something like this: when you use Lucene,
> if you want "latest and greatest" defaults, do nothing.
>
> If instead you want defaults to match a particular past minor release,
> you must call (say) LuceneVersions.setVersion(VERSION_21).

Either way would work - we could reverse it for stronger back compat if desired.
For 3.0, and all 3.x releases, set actsAsVersion=30000 by default in Lucene.
A program could set actsAsVersion=LUCENE_VERSION_ANY (999999) and
always get new behavior,
or just  choose the specific version they are using to test/develop
with; actsAsVersion=30201 to get the behavior changes of 3.2.1

But since 3.0 is a major release anyway, we could change the default
of actsAsVersion with each 3.x release (or just set it to 39999) and
require that a users set actsAsVersion=30000 (or whatever version they
are on) in order to get maximum back compatibility.

For 2.9, we could start changing behavior and default
actsAsVersion=20401 (or 20499?) to act like the latest 2.4.x release.

And we could still leisurely proceed with Settings classes where they
made sense.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Mark Miller <ma...@gmail.com>.

Michael McCandless wrote:
> On Tue, May 19, 2009 at 4:50 PM, Yonik Seeley
> <yo...@lucidimagination.com> wrote:
>
>   
>>> Right, that's exactly why I want to fix it (only one behavior allowed
>>> and so for all of 2.* we must match the 2.0 behavior).
>>>       
>> I meant one jar per per-jvm gives you one behavior (as is the case now).
>> But by setting a static actsAs version number, you could get a 2.* jar
>> to behave as if it were 2.0, even as behaviors evolve.
>>     
>
> So I think you're suggesting something like this: when you use Lucene,
> if you want "latest and greatest" defaults, do nothing.
>
> If instead you want defaults to match a particular past minor release,
> you must call (say) LuceneVersions.setVersion(VERSION_21).
>
> Any place inside Lucene that has defaults that need to vary by version
> would then check this, and act accordingly.
>
> I absolutely love the simplicity of this solution (far simpler than
> *Settings classes).  It would achieve what I'm aiming for, which is to
> always be free on every minor release to set the defaults for new
> users to the latest & greatest.
>
> But:
>
>   1) It means any usage of Lucene inside the JRE must share that same
>      version default
>
>   2) It's a change to our back-compat policy, in that it requires the
>      app to declare what version compatibility it requires.
>
> On #1, maybe this is in fact just fine, since as you pointed out
> that's de-facto what we have today; it's just that the "actsAs" is
> hardwired to 2.0 for all 2.x releases.
>
> On #2, I think shifting the burden onto those apps that do in fact
> need strict back-compat on upgrading, to have to set the actsAs is a
> good change to our policy.  After all, we think such users are the
> minority and putting the burden on new users of Lucene seems
> unreasonable.
>
> So net/net I'm +1!
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>   
I kind of like it too. I like the core proposal, but I am not a big fan 
of having to pass a settings class to each of the major Lucene classes. 
A single static call would be much preferable.

- Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Tue, May 19, 2009 at 4:50 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:

>> Right, that's exactly why I want to fix it (only one behavior allowed
>> and so for all of 2.* we must match the 2.0 behavior).
>
> I meant one jar per per-jvm gives you one behavior (as is the case now).
> But by setting a static actsAs version number, you could get a 2.* jar
> to behave as if it were 2.0, even as behaviors evolve.

So I think you're suggesting something like this: when you use Lucene,
if you want "latest and greatest" defaults, do nothing.

If instead you want defaults to match a particular past minor release,
you must call (say) LuceneVersions.setVersion(VERSION_21).

Any place inside Lucene that has defaults that need to vary by version
would then check this, and act accordingly.

I absolutely love the simplicity of this solution (far simpler than
*Settings classes).  It would achieve what I'm aiming for, which is to
always be free on every minor release to set the defaults for new
users to the latest & greatest.

But:

  1) It means any usage of Lucene inside the JRE must share that same
     version default

  2) It's a change to our back-compat policy, in that it requires the
     app to declare what version compatibility it requires.

On #1, maybe this is in fact just fine, since as you pointed out
that's de-facto what we have today; it's just that the "actsAs" is
hardwired to 2.0 for all 2.x releases.

On #2, I think shifting the burden onto those apps that do in fact
need strict back-compat on upgrading, to have to set the actsAs is a
good change to our policy.  After all, we think such users are the
minority and putting the burden on new users of Lucene seems
unreasonable.

So net/net I'm +1!

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Tue, May 19, 2009 at 4:33 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> On Tue, May 19, 2009 at 2:27 PM, Yonik Seeley
> <yo...@lucidimagination.com> wrote:
>> On Tue, May 19, 2009 at 2:04 PM, Michael McCandless
>> <lu...@mikemccandless.com> wrote:
>>> On Tue, May 19, 2009 at 9:34 AM, Yonik Seeley
>>> <yo...@lucidimagination.com> wrote:
>>>
>>>> Selecting backward compatibility vs latest and greatest could be done
>>>> w/o Settings (a simple static int containing the version number to act
>>>> like).  It seems like the Settings debate should be based on it's own
>>>> merits.
>>>
>>> But isn't a static int too restrictive?  That means all usage of
>>> Lucene from within this JRE must match that version?
>>
>> Isn't that currently the case though?  One Lucene jar, one behavior.
>
> Right, that's exactly why I want to fix it (only one behavior allowed
> and so for all of 2.* we must match the 2.0 behavior).

I meant one jar per per-jvm gives you one behavior (as is the case now).
But by setting a static actsAs version number, you could get a 2.* jar
to behave as if it were 2.0, even as behaviors evolve.

I'm not saying that a Settings class is a bad idea - it's just bigger
than the issue of handling strict back compatibility and evolution at
the same time, which could possibly be done in a much simpler manner
w/o any API changes.  Of course if we decided to go with a Settings
class, it would render a static actsAs redundant.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Tue, May 19, 2009 at 2:27 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Tue, May 19, 2009 at 2:04 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>> On Tue, May 19, 2009 at 9:34 AM, Yonik Seeley
>> <yo...@lucidimagination.com> wrote:
>>
>>> Selecting backward compatibility vs latest and greatest could be done
>>> w/o Settings (a simple static int containing the version number to act
>>> like).  It seems like the Settings debate should be based on it's own
>>> merits.
>>
>> But isn't a static int too restrictive?  That means all usage of
>> Lucene from within this JRE must match that version?
>
> Isn't that currently the case though?  One Lucene jar, one behavior.

Right, that's exactly why I want to fix it (only one behavior allowed
and so for all of 2.* we must match the 2.0 behavior).  We've come
full circle ;)

Ie the status quo is bad since we are forced to hurt new users in
order to preserve back compat, when presumably new users outnumber
back-compat users who are upgrading.

I'd love to default no-scoring when sorting by field, in 2.9, but I
can't, unless we had something along the lines of *Settings, or
"specify the version compat you require" when creating each class.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Tue, May 19, 2009 at 2:04 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> On Tue, May 19, 2009 at 9:34 AM, Yonik Seeley
> <yo...@lucidimagination.com> wrote:
>
>> Selecting backward compatibility vs latest and greatest could be done
>> w/o Settings (a simple static int containing the version number to act
>> like).  It seems like the Settings debate should be based on it's own
>> merits.
>
> But isn't a static int too restrictive?  That means all usage of
> Lucene from within this JRE must match that version?

Isn't that currently the case though?  One Lucene jar, one behavior.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Yonik Seeley <yo...@lucidimagination.com>.

Selecting backward compatibility vs latest and greatest could be done
w/o Settings (a simple static int containing the version number to act
like).  It seems like the Settings debate should be based on it's own
merits.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Lucene's default settings & back compatibility

Posted by Jason Rutherglen <ja...@gmail.com>.

Yeah makes sense, getting in depth with Lucene, and then seeing
real world usage, most users still do use the defaults. I think
I will try to do help this by writing some wiki pages on new
features. Probably this OldSettings/NewSettings model is a good
start for a wiki page?

Our current wiki FAQ is a bit long, so it should help to have a
new page that goes over configurations for different goals.

On Mon, May 18, 2009 at 2:21 PM, Michael Busch <bu...@gmail.com> wrote:

> +1. this would be great!
>
>  Michael
>
>
> On May 18, 2009, at 2:06 PM, Michael McCandless <lu...@mikemccandless.com>
> wrote:
>
>  As we all know, Lucene's back-compat policy necessarily hurts the
>> out-of-the-box experience for new users: because we are only allowed
>> make substantial improvements to Lucene's default settings at a major
>> release, new users won't see the improvements to our settings until a
>> major release (typically years apart).
>>
>> Lucene has a number of default settings, eg some recent examples:
>>
>>  * Read-only IndexReader gives better much performance with threads,
>>   yet we must now default IndexReader.open to return a non-readOnly
>>   reader
>>
>>  * We can now optionally turn off scoring when sorting by field
>>   (sizable speed gain), but we had to leave it on by default until
>>   3.0
>>
>>  * Letting IndexReader.norms return null
>>
>>  * LogMergePolicy now takes deletions into account, but we had to
>>   disable it by default, since it could conceivably break back
>>   compat.
>>
>>  * Bug fixes in StandardAnalyzer must be delayed until 3.0 since
>>   there's a remote chance they'd break back compat in an app, or we
>>   end up adding confusing methods like "public static void
>>   setDefaultReplaceInvalidAcronym".
>>
>>  * NIOFSDirectory ought to be "the default" on UNIX, but it's not
>>
>>  * Constant score rewrite ought to be the default for most multi-term
>>   queries
>>
>>  * StopFilter should enable position increments by default
>>
>> The fact that we are "forced" delay such "out of the box" improvements
>> to Lucene for so long is a frustrating cost, since it can only stunt
>> Lucene's adoption and growth and my sense is that it's a minority of
>> Lucene's users that need such strict back-compat (this has been
>> discussed before).  It also clutters our APIs because we end up
>> creating setter/getters that often only exist for the sake of a back
>> compat preservation of a bug.
>>
>> I think we can fix this.  Ie, maintain our strong back-compat policy,
>> yet still allow new users to experience the best of Lucene on every
>> release (not just on major releases), by creating an explicit class
>> that holds settings/defaults used by Lucene.
>>
>> For example, say we create a base class named Settings.  It holds the
>> defaults for settings across all of Lucene's classes. When you create
>> IndexReader, IndexWriter and others, you must pass in a Settings
>> instance.
>>
>> A subclass, SettingsMatching24, binds all settings to "match" 2.4's
>> behavior.  When we make improvements in 2.9, we'd add the back-compat
>> settings to SettingsMatching24.  So if your app wants to keep exactly
>> 2.4's behavior, you'd pass in SettingsMatching24().  On upgrading to
>> 2.9 you'd still see 2.4's behavior.
>>
>> Users who'd like to see Lucene's improvements on each minor release
>> would instead instantiate LatestAndGreatestSettings() (or
>> CurrentVersionSettings(), or something), understanding that when they
>> upgrade there might be biggish changes to Lucene's defaults.  My guess
>> is most users would use this settings class.
>>
>> Doug actually suggested this exact idea a while back:
>>
>>  http://www.gossamer-threads.com/lists/lucene/java-dev/54421#54421.
>>
>> Now that I realize we could use this to strongly decouple "users
>> wanting precise back-compat" from "users wanting the latest &
>> greatest", I think it's a very compelling solution.
>>
>> If we do this I'd like to do it in 2.9, so that starting with 3.x we
>> are free to change default settings w/o breaking back compat.
>>
>> Thoughts?
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: Lucene's default settings & back compatibility

Posted by Michael Busch <bu...@gmail.com>.

+1. this would be great!

  Michael

On May 18, 2009, at 2:06 PM, Michael McCandless <lucene@mikemccandless.com 
 > wrote:

> As we all know, Lucene's back-compat policy necessarily hurts the
> out-of-the-box experience for new users: because we are only allowed
> make substantial improvements to Lucene's default settings at a major
> release, new users won't see the improvements to our settings until a
> major release (typically years apart).
>
> Lucene has a number of default settings, eg some recent examples:
>
>  * Read-only IndexReader gives better much performance with threads,
>    yet we must now default IndexReader.open to return a non-readOnly
>    reader
>
>  * We can now optionally turn off scoring when sorting by field
>    (sizable speed gain), but we had to leave it on by default until
>    3.0
>
>  * Letting IndexReader.norms return null
>
>  * LogMergePolicy now takes deletions into account, but we had to
>    disable it by default, since it could conceivably break back
>    compat.
>
>  * Bug fixes in StandardAnalyzer must be delayed until 3.0 since
>    there's a remote chance they'd break back compat in an app, or we
>    end up adding confusing methods like "public static void
>    setDefaultReplaceInvalidAcronym".
>
>  * NIOFSDirectory ought to be "the default" on UNIX, but it's not
>
>  * Constant score rewrite ought to be the default for most multi-term
>    queries
>
>  * StopFilter should enable position increments by default
>
> The fact that we are "forced" delay such "out of the box" improvements
> to Lucene for so long is a frustrating cost, since it can only stunt
> Lucene's adoption and growth and my sense is that it's a minority of
> Lucene's users that need such strict back-compat (this has been
> discussed before).  It also clutters our APIs because we end up
> creating setter/getters that often only exist for the sake of a back
> compat preservation of a bug.
>
> I think we can fix this.  Ie, maintain our strong back-compat policy,
> yet still allow new users to experience the best of Lucene on every
> release (not just on major releases), by creating an explicit class
> that holds settings/defaults used by Lucene.
>
> For example, say we create a base class named Settings.  It holds the
> defaults for settings across all of Lucene's classes. When you create
> IndexReader, IndexWriter and others, you must pass in a Settings
> instance.
>
> A subclass, SettingsMatching24, binds all settings to "match" 2.4's
> behavior.  When we make improvements in 2.9, we'd add the back-compat
> settings to SettingsMatching24.  So if your app wants to keep exactly
> 2.4's behavior, you'd pass in SettingsMatching24().  On upgrading to
> 2.9 you'd still see 2.4's behavior.
>
> Users who'd like to see Lucene's improvements on each minor release
> would instead instantiate LatestAndGreatestSettings() (or
> CurrentVersionSettings(), or something), understanding that when they
> upgrade there might be biggish changes to Lucene's defaults.  My guess
> is most users would use this settings class.
>
> Doug actually suggested this exact idea a while back:
>
>  http://www.gossamer-threads.com/lists/lucene/java-dev/54421#54421.
>
> Now that I realize we could use this to strongly decouple "users
> wanting precise back-compat" from "users wanting the latest &
> greatest", I think it's a very compelling solution.
>
> If we do this I'd like to do it in 2.9, so that starting with 3.x we
> are free to change default settings w/o breaking back compat.
>
> Thoughts?
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org