You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Grant Ingersoll <gs...@apache.org> on 2008/01/17 13:51:41 UTC

Back Compatibility

I have been thinking for a while that is time we revisit our back-
compatibility "policy" (http://wiki.apache.org/lucene-java/BackwardsCompatibility
) in terms of maybe becoming a little leaner and also in terms of
addressing some issues that come up from time to time in relation to
bug fixes that effect how Tokens are produced. As examples of the
latter, see: https://issues.apache.org/jira/browse/LUCENE-1084 and https://issues.apache.org/jira/browse/LUCENE-1100
. Examples of the former issue include things like removing
deprecations sooner and the ability to add new methods to interfaces
(both of these are not to be done ad-hoc)

In the case of bugs, the main issue is that people may be expecting
the "incorrect" behavior (admittedly, the maxFieldLength is not
incorrect), so the question becomes should we be in the business of
preserving incorrect values for a full version?

In the case of being "leaner", there are times when it would be useful
to be able to add new methods to interfaces w/o waiting for a full
major release (other projects do this) and also being able to pare
down the deprecated methods sooner.

I propose a couple of solutions to the leaner issue, but I am not sure
how to handle the incorrectness issue, although I suppose it could be
similar. With all of this, I really think the issue comes down to how
we communicate current and future changes to our users.

1. We add a new section to CHANGES for each release, at the top where
we can declare what deprecations will be removed in the _next_ release
(major or minor) and also any interface API changes
2. When deprecating, the @deprecate tag should declare what version it
will be removed in and that version must be one greater than the next
targeted release. That is, if the next release is 2.4, then anything
deprecated in 2.3 is game to be removed in 2.9.
3. Other ways of communicating changes????

My reasoning for this solution: Our minor release cycles are
currently in the 3-6 months range and our major release cycles are in
the 1-1.5 year range. I think giving someone 4-8 (or whatever) months
is more than enough time to prepare for API changes. I am not sure
how this would effect Index changes, but I do think we should KEEP our
current index reading policy where possible. This may mean that some
deprecated items cannot be removed until a major release and I think
that is fine.

Do people think that the bug issue also fits into this way of doing
things? Or do we need another way to think about those?

These are just suggestions and I am interested in hearing more about
what people think. I know, in some sense, it may make us less stable,
but I doubt it given the time frame of our releases. I also know a
perfectly valid response is "If it ain't broke, don't fix it" and to
great extent, I know it ain't broke. And believe me, I am fine with
that. I am just wondering if there is an opportunity to make Lucene
better.

Cheers,
Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by robert engels <re...@ix.netcom.com>.

I will do so.

On Jan 24, 2008, at 12:44 PM, DM Smith wrote:

> This is now a hijacked thread. It is very interesting, but it may  
> be hard to find again. Wouldn't it be better to record this thread  
> differently, perhaps opening a Jira issue to add XA to Lucene?
>
> -- DM
>
> Doron Cohen wrote:
>> On Jan 24, 2008 6:55 PM, robert engels <re...@ix.netcom.com> wrote:
>>
>>
>>> Thanks, you are correct, but I am not sure it covers the complete  
>>> case.
>>>
>>> Change it a bit to be:
>>>
>>> A opens reader.
>>> B opens reader.
>>> A performs query decides a new document is needed
>>> B performs query decides a new document is needed
>>> B gets writer, adds document, closes
>>> A gets writer, adds document, closes
>>>
>>> There needs to be a way to manually serialize these operations. I
>>> assume I should just do this:
>>>
>>> A gets writer
>>> B gets writer - can't so blocked
>>> A opens reader
>>> A performs query decides a new document is needed
>>> A adds document
>>> A closes reader
>>> A closes writer
>>> B now gets writer
>>> B opens reader
>>> B performs query sees a new document is not needed
>>> B closes reader
>>> B closes writer
>>>
>>> Previously, with the read locks, I did not think you could open the
>>> reader after you had the write lock.
>>>
>>> Am I correct here?
>>>
>>
>>
>> If  I understand you correctly then yes and no :-)
>>
>> "Yes" in the sense that this would work and achieve the
>> required serialization, and "no" in that you could always open
>> readers whether there was an open writer or not.
>>
>> The current locking logic with readers is that opening a reader does
>> not require acquiring any lock. Only when attempting to use the  
>> reader
>> for a write operation (e.g. delete) the reader becomes a writer, and
>> for that it (1) acquires a write lock and (2) verifies that the
>> index was not modified by any writer since the reader was
>> first opened (or else it throws that stale exception).
>>
>> Prior to lockless-commit there were two lock types - write-lock and
>> commit-lock. The commit-lock was used only briefly - during file  
>> opening
>> during reader-opening, to guarantee that no writer modifies the  
>> files that
>> the
>> reader is reading (especially the segments file). Lockles-commits  
>> got rid
>> of the commit lock (mainly by changing to never modify a file once  
>> it was
>> written.) Write locks are still in use, but only for writers, as  
>> described
>> above.
>> (Mike feel free to correct me here...)
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by DM Smith <dm...@gmail.com>.

This is now a hijacked thread. It is very interesting, but it may be 
hard to find again. Wouldn't it be better to record this thread 
differently, perhaps opening a Jira issue to add XA to Lucene?

-- DM

Doron Cohen wrote:
> On Jan 24, 2008 6:55 PM, robert engels <re...@ix.netcom.com> wrote:
>
>   
>> Thanks, you are correct, but I am not sure it covers the complete case.
>>
>> Change it a bit to be:
>>
>> A opens reader.
>> B opens reader.
>> A performs query decides a new document is needed
>> B performs query decides a new document is needed
>> B gets writer, adds document, closes
>> A gets writer, adds document, closes
>>
>> There needs to be a way to manually serialize these operations. I
>> assume I should just do this:
>>
>> A gets writer
>> B gets writer - can't so blocked
>> A opens reader
>> A performs query decides a new document is needed
>> A adds document
>> A closes reader
>> A closes writer
>> B now gets writer
>> B opens reader
>> B performs query sees a new document is not needed
>> B closes reader
>> B closes writer
>>
>> Previously, with the read locks, I did not think you could open the
>> reader after you had the write lock.
>>
>> Am I correct here?
>>     
>
>
> If  I understand you correctly then yes and no :-)
>
> "Yes" in the sense that this would work and achieve the
> required serialization, and "no" in that you could always open
> readers whether there was an open writer or not.
>
> The current locking logic with readers is that opening a reader does
> not require acquiring any lock. Only when attempting to use the reader
> for a write operation (e.g. delete) the reader becomes a writer, and
> for that it (1) acquires a write lock and (2) verifies that the
> index was not modified by any writer since the reader was
> first opened (or else it throws that stale exception).
>
> Prior to lockless-commit there were two lock types - write-lock and
> commit-lock. The commit-lock was used only briefly - during file opening
> during reader-opening, to guarantee that no writer modifies the files that
> the
> reader is reading (especially the segments file). Lockles-commits got rid
> of the commit lock (mainly by changing to never modify a file once it was
> written.) Write locks are still in use, but only for writers, as described
> above.
> (Mike feel free to correct me here...)
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Doron Cohen <cd...@gmail.com>.

On Jan 24, 2008 6:55 PM, robert engels <re...@ix.netcom.com> wrote:

> Thanks, you are correct, but I am not sure it covers the complete case.
>
> Change it a bit to be:
>
> A opens reader.
> B opens reader.
> A performs query decides a new document is needed
> B performs query decides a new document is needed
> B gets writer, adds document, closes
> A gets writer, adds document, closes
>
> There needs to be a way to manually serialize these operations. I
> assume I should just do this:
>
> A gets writer
> B gets writer - can't so blocked
> A opens reader
> A performs query decides a new document is needed
> A adds document
> A closes reader
> A closes writer
> B now gets writer
> B opens reader
> B performs query sees a new document is not needed
> B closes reader
> B closes writer
>
> Previously, with the read locks, I did not think you could open the
> reader after you had the write lock.
>
> Am I correct here?

If  I understand you correctly then yes and no :-)

"Yes" in the sense that this would work and achieve the
required serialization, and "no" in that you could always open
readers whether there was an open writer or not.

The current locking logic with readers is that opening a reader does
not require acquiring any lock. Only when attempting to use the reader
for a write operation (e.g. delete) the reader becomes a writer, and
for that it (1) acquires a write lock and (2) verifies that the
index was not modified by any writer since the reader was
first opened (or else it throws that stale exception).

Prior to lockless-commit there were two lock types - write-lock and
commit-lock. The commit-lock was used only briefly - during file opening
during reader-opening, to guarantee that no writer modifies the files that
the
reader is reading (especially the segments file). Lockles-commits got rid
of the commit lock (mainly by changing to never modify a file once it was
written.) Write locks are still in use, but only for writers, as described
above.
(Mike feel free to correct me here...)

Re: Back Compatibility

Posted by robert engels <re...@ix.netcom.com>.

Thanks, you are correct, but I am not sure it covers the complete case.

Change it a bit to be:

A opens reader.
B opens reader.
A performs query decides a new document is needed
B performs query decides a new document is needed
B gets writer, adds document, closes
A gets writer, adds document, closes

There needs to be a way to manually serialize these operations. I  
assume I should just do this:

A gets writer
B gets writer - can't so blocked
A opens reader
A performs query decides a new document is needed
A adds document
A closes reader
A closes writer
B now gets writer
B opens reader
B performs query sees a new document is not needed
B closes reader
B closes writer

Previously, with the read locks, I did not think you could open the  
reader after you had the write lock.

Am I correct here?

On Jan 24, 2008, at 2:13 AM, Doron Cohen wrote:

> On Jan 24, 2008 12:31 AM, robert engels <re...@ix.netcom.com> wrote:
>
>> You must get the write lock before opening the reader if you want
>> transactional consistency and are performing updates.
>>
>> No other way to do it.
>>
>> Otherwise.
>>
>> A opens reader.
>> B opens reader.
>> A performs query decides an update is needed based on results
>> B performs query decides an update is needed based on results
>> B gets write lock
>> B updates
>> B releases
>> A gets write lock
>
>
> Lucene actually protects from this - 'A' would fail to acquire the  
> write
> lock, with a stale-index-exception (this is tested in TesIndexReader -
> testDeleteReaderReaderConflict).
>
>
>> A performs update - ERROR. A is performing an update based on  
>> stale data
>>
>> If A & B want to update an index, it must work as:
>>
>> A gets lock
>> A opens reader
>> A updates
>> A releases lock
>> B get lcoks
>> B opens reader
>> B updates
>> B releases lock
>>
>> The only way you can avoid this is if system can determine that B's
>> query results in the first case would not change based on A's  
>> updates.
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by robert engels <re...@ix.netcom.com>.

Sorry, I am using "gets lock" to mean 'opening the index'. I was  
simplifying the the procedure.

I think your comment is not correct in this context.

On Jan 24, 2008, at 3:16 AM, Michael McCandless wrote:

> Doron Cohen wrote:
>
>> ------=_Part_11325_2615585.1201162438596
>> Content-Type: text/plain; charset=ISO-8859-1
>> Content-Transfer-Encoding: 7bit
>> Content-Disposition: inline
>>
>> On Jan 24, 2008 12:31 AM, robert engels <re...@ix.netcom.com>  
>> wrote:
>>
>>> You must get the write lock before opening the reader if you want
>>> transactional consistency and are performing updates.
>>>
>>> No other way to do it.
>>>
>>> Otherwise.
>>>
>>> A opens reader.
>>> B opens reader.
>>> A performs query decides an update is needed based on results
>>> B performs query decides an update is needed based on results
>>> B gets write lock
>>> B updates
>>> B releases
>>> A gets write lock
>>
>>
>> Lucene actually protects from this - 'A' would fail to acquire the  
>> write
>> lock, with a stale-index-exception (this is tested in  
>> TesIndexReader -
>> testDeleteReaderReaderConflict).
>
> Aha, you are right Doron!  Indeed Lucene effectively serializes  
> this case, using the write.lock.
>
>>
>>> A performs update - ERROR. A is performing an update based on  
>>> stale data
>>>
>>> If A & B want to update an index, it must work as:
>>>
>>> A gets lock
>>> A opens reader
>>> A updates
>>> A releases lock
>>> B get lcoks
>>> B opens reader
>>> B updates
>>> B releases lock
>>>
>>> The only way you can avoid this is if system can determine that B's
>>> query results in the first case would not change based on A's  
>>> updates.
>
> And, in this case, B will fail when it tries to get the lock.  It  
> must be re-opened so it first sees the changes committed by A.
>
> So, Lucene is transactional, but forces clients to serialize their  
> write operations (ie, one cannot have multiple transactions open at  
> once).
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

Doron Cohen wrote:

> ------=_Part_11325_2615585.1201162438596
> Content-Type: text/plain; charset=ISO-8859-1
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
>
> On Jan 24, 2008 12:31 AM, robert engels <re...@ix.netcom.com> wrote:
>
>> You must get the write lock before opening the reader if you want
>> transactional consistency and are performing updates.
>>
>> No other way to do it.
>>
>> Otherwise.
>>
>> A opens reader.
>> B opens reader.
>> A performs query decides an update is needed based on results
>> B performs query decides an update is needed based on results
>> B gets write lock
>> B updates
>> B releases
>> A gets write lock
>
>
> Lucene actually protects from this - 'A' would fail to acquire the  
> write
> lock, with a stale-index-exception (this is tested in TesIndexReader -
> testDeleteReaderReaderConflict).

Aha, you are right Doron!  Indeed Lucene effectively serializes this  
case, using the write.lock.

>
>> A performs update - ERROR. A is performing an update based on  
>> stale data
>>
>> If A & B want to update an index, it must work as:
>>
>> A gets lock
>> A opens reader
>> A updates
>> A releases lock
>> B get lcoks
>> B opens reader
>> B updates
>> B releases lock
>>
>> The only way you can avoid this is if system can determine that B's
>> query results in the first case would not change based on A's  
>> updates.

And, in this case, B will fail when it tries to get the lock.  It  
must be re-opened so it first sees the changes committed by A.

So, Lucene is transactional, but forces clients to serialize their  
write operations (ie, one cannot have multiple transactions open at  
once).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Doron Cohen <cd...@gmail.com>.

On Jan 24, 2008 12:31 AM, robert engels <re...@ix.netcom.com> wrote:

> You must get the write lock before opening the reader if you want
> transactional consistency and are performing updates.
>
> No other way to do it.
>
> Otherwise.
>
> A opens reader.
> B opens reader.
> A performs query decides an update is needed based on results
> B performs query decides an update is needed based on results
> B gets write lock
> B updates
> B releases
> A gets write lock


Lucene actually protects from this - 'A' would fail to acquire the write
lock, with a stale-index-exception (this is tested in TesIndexReader -
testDeleteReaderReaderConflict).


> A performs update - ERROR. A is performing an update based on stale data
>
> If A & B want to update an index, it must work as:
>
> A gets lock
> A opens reader
> A updates
> A releases lock
> B get lcoks
> B opens reader
> B updates
> B releases lock
>
> The only way you can avoid this is if system can determine that B's
> query results in the first case would not change based on A's updates.
>

Re: Back Compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

Right.

But, that can, and should, be done outside of the Lucene core.

Mike

robert engels wrote:

> You must get the write lock before opening the reader if you want  
> transactional consistency and are performing updates.
>
> No other way to do it.
>
> Otherwise.
>
> A opens reader.
> B opens reader.
> A performs query decides an update is needed based on results
> B performs query decides an update is needed based on results
> B gets write lock
> B updates
> B releases
> A gets write lock
> A performs update - ERROR. A is performing an update based on stale  
> data
>
> If A & B want to update an index, it must work as:
>
> A gets lock
> A opens reader
> A updates
> A releases lock
> B get lcoks
> B opens reader
> B updates
> B releases lock
>
> The only way you can avoid this is if system can determine that B's  
> query results in the first case would not change based on A's updates.
>
> On Jan 23, 2008, at 4:03 PM, Michael McCandless wrote:
>
>>
>> robert engels wrote:
>>
>>> Thanks.
>>>
>>> So all writers still need to get the write lock, before opening  
>>> the reader in order to maintain transactional consistency.
>>
>> I don't understand what you mean by "before opening the reader"?   
>> A writer acquires the write.lock before opening.  Readers do not,  
>> unless/until they do their first write operation (deleteDocument/ 
>> setNorm).
>>
>>> Was there performance testing done on the lockless commits with  
>>> heavy contention? I would think that reading the directory to  
>>> find the latest segments file would be slower. Is there a 'latest  
>>> segments' file to avoid this? If not, there probably should be.  
>>> As long as the data fits in a single disk block (which is  
>>> should), I don't think you will have a consistency problem.
>>
>> Performance tests were done (see LUCENE-710).
>>
>> And, yes, there is a file segments.gen that records the latest  
>> segment, but it is used along with the directory listing to find  
>> the current segments file.
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by robert engels <re...@ix.netcom.com>.

You must get the write lock before opening the reader if you want  
transactional consistency and are performing updates.

No other way to do it.

Otherwise.

A opens reader.
B opens reader.
A performs query decides an update is needed based on results
B performs query decides an update is needed based on results
B gets write lock
B updates
B releases
A gets write lock
A performs update - ERROR. A is performing an update based on stale data

If A & B want to update an index, it must work as:

A gets lock
A opens reader
A updates
A releases lock
B get lcoks
B opens reader
B updates
B releases lock

The only way you can avoid this is if system can determine that B's  
query results in the first case would not change based on A's updates.

On Jan 23, 2008, at 4:03 PM, Michael McCandless wrote:

>
> robert engels wrote:
>
>> Thanks.
>>
>> So all writers still need to get the write lock, before opening  
>> the reader in order to maintain transactional consistency.
>
> I don't understand what you mean by "before opening the reader"?  A  
> writer acquires the write.lock before opening.  Readers do not,  
> unless/until they do their first write operation (deleteDocument/ 
> setNorm).
>
>> Was there performance testing done on the lockless commits with  
>> heavy contention? I would think that reading the directory to find  
>> the latest segments file would be slower. Is there a 'latest  
>> segments' file to avoid this? If not, there probably should be. As  
>> long as the data fits in a single disk block (which is should), I  
>> don't think you will have a consistency problem.
>
> Performance tests were done (see LUCENE-710).
>
> And, yes, there is a file segments.gen that records the latest  
> segment, but it is used along with the directory listing to find  
> the current segments file.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

robert engels wrote:

> Thanks.
>
> So all writers still need to get the write lock, before opening the  
> reader in order to maintain transactional consistency.

I don't understand what you mean by "before opening the reader"?  A  
writer acquires the write.lock before opening.  Readers do not,  
unless/until they do their first write operation (deleteDocument/ 
setNorm).

> Was there performance testing done on the lockless commits with  
> heavy contention? I would think that reading the directory to find  
> the latest segments file would be slower. Is there a 'latest  
> segments' file to avoid this? If not, there probably should be. As  
> long as the data fits in a single disk block (which is should), I  
> don't think you will have a consistency problem.

Performance tests were done (see LUCENE-710).

And, yes, there is a file segments.gen that records the latest  
segment, but it is used along with the directory listing to find the  
current segments file.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by robert engels <re...@ix.netcom.com>.

Thanks.

So all writers still need to get the write lock, before opening the  
reader in order to maintain transactional consistency.

Was there performance testing done on the lockless commits with heavy  
contention? I would think that reading the directory to find the  
latest segments file would be slower. Is there a 'latest segments'  
file to avoid this? If not, there probably should be. As long as the  
data fits in a single disk block (which is should), I don't think you  
will have a consistency problem.

On Jan 23, 2008, at 1:40 PM, Michael McCandless wrote:

> robert engels wrote:
>
>> I guess I don't understand what a commit lock is, or what's its  
>> purpose is. It seems the write lock is all that is needed.
>
> The commit.lock was used to guard access to the "segments" file.  A  
> reader would acquire the lock (blocking out other readers and  
> writers) when reading the file.  And a writer would acquire the  
> lock when writing it.
>
>> If you still need a write lock, then what is the purpose of  
>> "lockless" commits.
>
> Lockless commits got rid of one lock (commit.lock), not write.lock.
>
>> You can get consistency if all writers get the write lock before  
>> performing any read. It would seem this should be the requirement???
>
> In Lucene, you use an IndexReader to do reads (not a writer), which  
> does not block other readers.
>
>> Is there a Wiki or some such thing that discusses the "lockless  
>> commits", their purpose and their implementation? I find the email  
>> thread a bit cumbersome to review.
>
> No, but really the concept is very simple: instead of writing to  
> segments, we write to segments_1, then segments_2, etc.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

robert engels wrote:

> I guess I don't understand what a commit lock is, or what's its  
> purpose is. It seems the write lock is all that is needed.

The commit.lock was used to guard access to the "segments" file.  A  
reader would acquire the lock (blocking out other readers and  
writers) when reading the file.  And a writer would acquire the lock  
when writing it.

> If you still need a write lock, then what is the purpose of  
> "lockless" commits.

Lockless commits got rid of one lock (commit.lock), not write.lock.

> You can get consistency if all writers get the write lock before  
> performing any read. It would seem this should be the requirement???

In Lucene, you use an IndexReader to do reads (not a writer), which  
does not block other readers.

> Is there a Wiki or some such thing that discusses the "lockless  
> commits", their purpose and their implementation? I find the email  
> thread a bit cumbersome to review.

No, but really the concept is very simple: instead of writing to  
segments, we write to segments_1, then segments_2, etc.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by robert engels <re...@ix.netcom.com>.

I guess I don't understand what a commit lock is, or what's its  
purpose is. It seems the write lock is all that is needed.

If you still need a write lock, then what is the purpose of  
"lockless" commits.

You can get consistency if all writers get the write lock before  
performing any read. It would seem this should be the requirement???

Is there a Wiki or some such thing that discusses the "lockless  
commits", their purpose and their implementation? I find the email  
thread a bit cumbersome to review.


On Jan 23, 2008, at 11:55 AM, Michael McCandless wrote:

>
> robert engels wrote:
>
>> Maybe I don't understand lockless commits then.
>>
>> I just don't think you can enforce transactional consistency  
>> without either 1) locking, or 2) optimistic collision detection. I  
>> could be wrong here, but this has been my experience.
>> By effectively removing the locking requirement, I think you are  
>> going to have users developing code without thought as to what is  
>> going to happen when locking is added. This is going to break the  
>> backwards compatibility that people are striving for.
>
> Lucene still has locking (write.lock), to only allow one writer at  
> a time to make changes to the index (ie, it serializes writer  
> sessions).  Lock-less commits just replaced the old "commit.lock".
>
>> The lucene "writer" structure needs to be something like:
>>
>> start tx for update
>> do work
>> commit
>>
>> where commit is composed of (prepare and commit phases), but  
>> commit may fail.
>
> Right, this is what IndexWriter does now.  It's just that with  
> autoCommit=false you have total control on when that commit takes  
> place (only on closing the writer).
>
>> It is unknown if this can actually happen though, since there is  
>> no unique ID that could cause collisions, but there is the  
>> internal id (which would need to remain constant throughout the tx  
>> in order for queries and delete operations to work).
>
> Yes but there are other errors that Lucene may hit, like disk full,  
> which must (and do) rollback the commit to the start of the  
> transaction (ie, index state when writer was first opened).
>
>> I am sure it is that I don't understand lockless commits, so I  
>> will give a scenario.
>>
>> client A issues query looking for documents with OID (a field) =  
>> "some field";
>> client B issues same query
>> both queries return nothing found
>> client A inserts document with OID = "some filed"
>> client B inserts document with OID = "some field"
>>
>> client A commits and client B commits
>>
>> unless B is blocked, once A issues the query, the index is going  
>> to end up with 2 different copies of the document.
>>
>> I understand that Lucene is not a database, and has no concept of  
>> unique constraints. It is my understand that this has been  
>> overcome using locks and sequential access to the index when writing.
>>
>> In a simple XA implementation, client A would open a SERIALIZABLE  
>> transaction, which would block B from even reading the index.   
>> Most simple XA implementation only support READ_COMMITTED,  
>> SERIALIZABLE, and NONE.
>>
>> There are other ways of offering finer grained locking (based on  
>> internal id and timestamps), but most are going to need a "server  
>> based" implementation of lucene to pull off.
>>
>> To summarize, I think the "shared filestore (NFS)" and "lockless  
>> commits" make implementing transactions very difficult. I am sure  
>> I am missing something here, I just don't see what.
>
> Lucene hasn't ever supported that case above: it never blocks a  
> reader from opening the index.  But, you could easily build that on  
> top of Lucene, right?
>
> I'm still trying to understand what you feel is missing in the core  
> that prevents you from building XA (or, your own transactions  
> handling that involves another resource like a DB) on top of Lucene...
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

robert engels wrote:

> Maybe I don't understand lockless commits then.
>
> I just don't think you can enforce transactional consistency  
> without either 1) locking, or 2) optimistic collision detection. I  
> could be wrong here, but this has been my experience.
> By effectively removing the locking requirement, I think you are  
> going to have users developing code without thought as to what is  
> going to happen when locking is added. This is going to break the  
> backwards compatibility that people are striving for.

Lucene still has locking (write.lock), to only allow one writer at a  
time to make changes to the index (ie, it serializes writer  
sessions).  Lock-less commits just replaced the old "commit.lock".

> The lucene "writer" structure needs to be something like:
>
> start tx for update
> do work
> commit
>
> where commit is composed of (prepare and commit phases), but commit  
> may fail.

Right, this is what IndexWriter does now.  It's just that with  
autoCommit=false you have total control on when that commit takes  
place (only on closing the writer).

> It is unknown if this can actually happen though, since there is no  
> unique ID that could cause collisions, but there is the internal id  
> (which would need to remain constant throughout the tx in order for  
> queries and delete operations to work).

Yes but there are other errors that Lucene may hit, like disk full,  
which must (and do) rollback the commit to the start of the  
transaction (ie, index state when writer was first opened).

> I am sure it is that I don't understand lockless commits, so I will  
> give a scenario.
>
> client A issues query looking for documents with OID (a field) =  
> "some field";
> client B issues same query
> both queries return nothing found
> client A inserts document with OID = "some filed"
> client B inserts document with OID = "some field"
>
> client A commits and client B commits
>
> unless B is blocked, once A issues the query, the index is going to  
> end up with 2 different copies of the document.
>
> I understand that Lucene is not a database, and has no concept of  
> unique constraints. It is my understand that this has been overcome  
> using locks and sequential access to the index when writing.
>
> In a simple XA implementation, client A would open a SERIALIZABLE  
> transaction, which would block B from even reading the index.  Most  
> simple XA implementation only support READ_COMMITTED, SERIALIZABLE,  
> and NONE.
>
> There are other ways of offering finer grained locking (based on  
> internal id and timestamps), but most are going to need a "server  
> based" implementation of lucene to pull off.
>
> To summarize, I think the "shared filestore (NFS)" and "lockless  
> commits" make implementing transactions very difficult. I am sure I  
> am missing something here, I just don't see what.

Lucene hasn't ever supported that case above: it never blocks a  
reader from opening the index.  But, you could easily build that on  
top of Lucene, right?

I'm still trying to understand what you feel is missing in the core  
that prevents you from building XA (or, your own transactions  
handling that involves another resource like a DB) on top of Lucene...

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by robert engels <re...@ix.netcom.com>.

Maybe I don't understand lockless commits then.

I just don't think you can enforce transactional consistency without  
either 1) locking, or 2) optimistic collision detection. I could be  
wrong here, but this has been my experience.

By effectively removing the locking requirement, I think you are  
going to have users developing code without thought as to what is  
going to happen when locking is added. This is going to break the  
backwards compatibility that people are striving for.

The lucene "writer" structure needs to be something like:

start tx for update
do work
commit

where commit is composed of (prepare and commit phases), but commit  
may fail.

It is unknown if this can actually happen though, since there is no  
unique ID that could cause collisions, but there is the internal id  
(which would need to remain constant throughout the tx in order for  
queries and delete operations to work).

I am sure it is that I don't understand lockless commits, so I will  
give a scenario.

client A issues query looking for documents with OID (a field) =  
"some field";
client B issues same query
both queries return nothing found
client A inserts document with OID = "some filed"
client B inserts document with OID = "some field"

client A commits and client B commits

unless B is blocked, once A issues the query, the index is going to  
end up with 2 different copies of the document.

I understand that Lucene is not a database, and has no concept of  
unique constraints. It is my understand that this has been overcome  
using locks and sequential access to the index when writing.

In a simple XA implementation, client A would open a SERIALIZABLE  
transaction, which would block B from even reading the index.  Most  
simple XA implementation only support READ_COMMITTED, SERIALIZABLE,  
and NONE.

There are other ways of offering finer grained locking (based on  
internal id and timestamps), but most are going to need a "server  
based" implementation of lucene to pull off.

To summarize, I think the "shared filestore (NFS)" and "lockless  
commits" make implementing transactions very difficult. I am sure I  
am missing something here, I just don't see what.

On Jan 23, 2008, at 8:53 AM, Mark Miller wrote:

> Thats where Robert is confusing me as well. To have XA support you  
> just need to be able to define a transaction, atomically commit, or  
> rollback. You also need a consistent state after any of these  
> operations. LUCENE-1044 seems to guarantee that, and so isn't it  
> more like finishing up needed work than going down the wrong path?  
> It seems more to me (and obviously I know a lot less about this  
> than either of you) that you have just gotten Lucene ready to add  
> XA support. Lucene now fulfills all of the requirements. No?  
> Someone just needs to write a boatload of JTA code :)
>
> It would seem the next step would be, as Robert suggests, to make a  
> transaction a first class citizen. The XA protocol will require  
> Lucene to communicate with the TM about what transactions it has  
> completed to help in failure recovery and transaction management. I  
> can certainly see the need for a better transaction abstraction to  
> help with this.
>
> A little enlightenment on this would be great robert. I am very  
> interested in it for future projects.
>
> And I have to point out...it just seems logical that we would make  
> things so that the index was consistent at some point before taking  
> the next step of making it consistent with other resources...no? I  
> am just still confused about Roberts objections to what is going on  
> here. I think that it would be a real leap forward to get it done  
> though.
>
> Also, as he mentioned, we really need a good distributed system  
> that allows for index partitioning. Thats the ticket to more  
> enterprise adoption. Could be Solr's work though...
>
> Michael McCandless wrote:
>>
>> Robert, besides LUCENE-1044 (syncing on commit), what is the Lucene
>> core missing in order for you (or, someone) to build XA compliance on
>> top of it?
>>
>> Ie, you can open a writer with autoCommit=false and no changes are
>> committed until you close it.  You can abort the session by calling
>> writer.abort().  What's still missing, besides LUCENE-1044?
>>
>> Mike
>>
>> robert engels wrote:
>>
>>> One more example on this. A lot of work was done on transaction  
>>> support. I would argue that this falls way short of what is  
>>> needed, since there is no XA transaction support. Since the  
>>> lucene index (unless stored in an XA db) is a separate resource,  
>>> it really needs XA support in order to be consistent with the  
>>> other resources.
>>>
>>> All of the transaction work that has been performed only  
>>> guarantees that barring a physical hardware failure the lucene  
>>> index can be opened and used at a known state.  This index though  
>>> is probably not consistent with the other resources.
>>>
>>> All that was done is that we can now guarantee that the index is  
>>> consistent at SOME point in time.
>>>
>>> Given the work that was done, we are probably closer to adding XA  
>>> support, but I think this would be much easier if the concept of  
>>> a transaction was made first class through the API (and then XA  
>>> transactions need to be supported).
>>>
>>> On Jan 22, 2008, at 2:49 PM, robert engels wrote:
>>>
>>>> I don't think group C is interested in bug fixes. I just don't  
>>>> see how Lucene is at all useful if the users are encountering  
>>>> any bug - so they either don't use that feature, or they have  
>>>> already developed a work-around (or they have patched the code  
>>>> in a way that avoids the bug, yet is specific to their  
>>>> environment).
>>>>
>>>> For example, I think the NFS work (bugs, fixes, etc.) was quite  
>>>> substantial. I think the actual number of people trying to use  
>>>> NFS is probably very low - as the initial implementation had so  
>>>> many problems (and IMO is not a very good solution for  
>>>> distributed indexes anyway). So all the work in trying to make  
>>>> NFS work "correctly" behind the scenes may have been  
>>>> inefficient, since a more direct, yet major fix may have solved  
>>>> the problem better (like distributed server support, not shared  
>>>> index access).
>>>>
>>>> I just think that trying to maintain API compatibility through  
>>>> major releases is a bad idea. Leads to bloat, and complex code -  
>>>> both internal and external.  In order to achieve great gains in  
>>>> usability and/or performance in a mature product like Lucene  
>>>> almost certainly requires massive changes to the processes,  
>>>> algorithms and structures, and the API should change as well to  
>>>> reflect this.
>>>>
>>>> On Jan 22, 2008, at 2:30 PM, Chris Hostetter wrote:
>>>>
>>>>>
>>>>> : If they are " no longer actively developing the portion of  
>>>>> the code that's
>>>>> : broken, aren't seeking the new feature, etc", and they stay  
>>>>> back on old
>>>>> : versions... isn't that exactly what we want? They can stay on  
>>>>> the old version,
>>>>> : and new application development uses the newer version.
>>>>>
>>>>> This basically mirrors a philosophy that is rising in the Perl
>>>>> community evangelized by (a really smart dude named chromatic) ...
>>>>> "why are we worry about the effect of upgrades on users who  
>>>>> don't upgrade?"
>>>>>
>>>>> The problem is not all users are created equal and not all  
>>>>> users upgrade
>>>>> for the same reasons or at the same time...
>>>>>
>>>>> Group A: If someone is paranoid about upgrading, and is still  
>>>>> running
>>>>> lucene1.4.3 because they are afraid if they upgrade their app  
>>>>> will break
>>>>> and they don't want to deal with it; they don't care about  
>>>>> known bugs in
>>>>> lucene1.4.3, as long as those bugs haven't impacted them yet --  
>>>>> these
>>>>> people aren't going to care wether we add a bunch of new  
>>>>> methods to
>>>>> interfaces, or remove a bunch of public methods from arbitrary  
>>>>> releases,
>>>>> because they are never going to see them.  They might do a  
>>>>> total rewrite
>>>>> of their project later, and they'll worry about it then (when  
>>>>> they have
>>>>> lots of time and QA resources)
>>>>>
>>>>> Group: B: At the other extreme, are the "free-spirited"  
>>>>> developers (god i
>>>>> hate that that the word "agile" has been co-opted) who are  
>>>>> always eager to
>>>>> upgrade to get the latest bells and whistles, and don't mind  
>>>>> making
>>>>> changes to code and recompiling everytime they upgrades -- just  
>>>>> as long as
>>>>> there are some decent docs on what to change.
>>>>>
>>>>> Croup: C: In the middle is a larg group of people who are  
>>>>> interested in
>>>>> upgrading, who want bug fixes, are willing to write new code to  
>>>>> take
>>>>> advantage of new features, in some cases are even willing to make
>>>>> small or medium changes their code to get really good performance
>>>>> improvements ... but they don't have a lot of time or energy to  
>>>>> constantly
>>>>> rewrite big chunks of their app.  For these people, knowing  
>>>>> that they can
>>>>> "drop in" the new version and it will work is a big reason why  
>>>>> there are
>>>>> willing to upgrade, and why they are willing to spend soem time
>>>>> tweaking code to take advantage of the new features and the new
>>>>> performacne enhaced APIs -- becuase they don't have to spend a  
>>>>> lot of time
>>>>> just to get the app working as well as it was before.
>>>>>
>>>>> To draw an analogy...
>>>>>
>>>>> Group A will stand in one place for a really long time no  
>>>>> matter how easy
>>>>> the path is.  Once in a great while they will decide to march  
>>>>> forward
>>>>> dozens of miles in one big push, but only once they feel they have
>>>>> adequate resources to make the entire trip at once.
>>>>>
>>>>> Group B likes to frolic, and will happily take two sptens  
>>>>> backward and
>>>>> then 3 steps forward every day.
>>>>>
>>>>> Group C will walk forward with you at a steady pace, and  
>>>>> occasionally even
>>>>> take a step back before moving forward, but only if the path is  
>>>>> clear and
>>>>> not very steap.
>>>>>
>>>>> : I bet, if you did a poll of all Lucene users, you would find  
>>>>> a majority of
>>>>> : them still only run 1.4.3, or maybe 1.9. Even with 2.0, 2.3,  
>>>>> or 3.0, that is
>>>>> : still going to be the case.
>>>>>
>>>>> That's probably true, but a nice perk of our current backwards
>>>>> compatibility commitments is that when people pop up asking  
>>>>> questions
>>>>> about 1.4.3, we can give them like "upgrading to 2.0.0 solves your
>>>>> problem" and that advice isn't a death sentence -- the steps to  
>>>>> move
>>>>> forward are small and easy.
>>>>>
>>>>> I look at things the way things like Maven v1 vs v2 worked out,  
>>>>> and how
>>>>> that fractured the community for a long time (as far as i can  
>>>>> tell it's
>>>>> still pretty fractured) because the path from v1 to v2 was so  
>>>>> steep and
>>>>> involved backtracking so much and i worry that if we make  
>>>>> changes to our
>>>>> "copatibility pledge" that don't allow for an even forward  
>>>>> walk, we'll
>>>>> wind up with a heavily fractured community.
>>>>>
>>>>>
>>>>>
>>>>> -Hoss
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> ---
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Yonik Seeley <yo...@apache.org>.

On Jan 23, 2008 9:53 AM, Mark Miller <ma...@gmail.com> wrote:
> Also, as he mentioned, we really need a good distributed system that
> allows for index partitioning. Thats the ticket to more enterprise
> adoption. Could be Solr's work though...

Yes, we're working on that :-)

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Mark Miller <ma...@gmail.com>.

Thats where Robert is confusing me as well. To have XA support you just 
need to be able to define a transaction, atomically commit, or rollback. 
You also need a consistent state after any of these operations. 
LUCENE-1044 seems to guarantee that, and so isn't it more like finishing 
up needed work than going down the wrong path? It seems more to me (and 
obviously I know a lot less about this than either of you) that you have 
just gotten Lucene ready to add XA support. Lucene now fulfills all of 
the requirements. No? Someone just needs to write a boatload of JTA code :)

It would seem the next step would be, as Robert suggests, to make a 
transaction a first class citizen. The XA protocol will require Lucene 
to communicate with the TM about what transactions it has completed to 
help in failure recovery and transaction management. I can certainly see 
the need for a better transaction abstraction to help with this.

A little enlightenment on this would be great robert. I am very 
interested in it for future projects.

And I have to point out...it just seems logical that we would make 
things so that the index was consistent at some point before taking the 
next step of making it consistent with other resources...no? I am just 
still confused about Roberts objections to what is going on here. I 
think that it would be a real leap forward to get it done though.

Also, as he mentioned, we really need a good distributed system that 
allows for index partitioning. Thats the ticket to more enterprise 
adoption. Could be Solr's work though...

Michael McCandless wrote:
>
> Robert, besides LUCENE-1044 (syncing on commit), what is the Lucene
> core missing in order for you (or, someone) to build XA compliance on
> top of it?
>
> Ie, you can open a writer with autoCommit=false and no changes are
> committed until you close it.  You can abort the session by calling
> writer.abort().  What's still missing, besides LUCENE-1044?
>
> Mike
>
> robert engels wrote:
>
>> One more example on this. A lot of work was done on transaction 
>> support. I would argue that this falls way short of what is needed, 
>> since there is no XA transaction support. Since the lucene index 
>> (unless stored in an XA db) is a separate resource, it really needs 
>> XA support in order to be consistent with the other resources.
>>
>> All of the transaction work that has been performed only guarantees 
>> that barring a physical hardware failure the lucene index can be 
>> opened and used at a known state.  This index though is probably not 
>> consistent with the other resources.
>>
>> All that was done is that we can now guarantee that the index is 
>> consistent at SOME point in time.
>>
>> Given the work that was done, we are probably closer to adding XA 
>> support, but I think this would be much easier if the concept of a 
>> transaction was made first class through the API (and then XA 
>> transactions need to be supported).
>>
>> On Jan 22, 2008, at 2:49 PM, robert engels wrote:
>>
>>> I don't think group C is interested in bug fixes. I just don't see 
>>> how Lucene is at all useful if the users are encountering any bug - 
>>> so they either don't use that feature, or they have already 
>>> developed a work-around (or they have patched the code in a way that 
>>> avoids the bug, yet is specific to their environment).
>>>
>>> For example, I think the NFS work (bugs, fixes, etc.) was quite 
>>> substantial. I think the actual number of people trying to use NFS 
>>> is probably very low - as the initial implementation had so many 
>>> problems (and IMO is not a very good solution for distributed 
>>> indexes anyway). So all the work in trying to make NFS work 
>>> "correctly" behind the scenes may have been inefficient, since a 
>>> more direct, yet major fix may have solved the problem better (like 
>>> distributed server support, not shared index access).
>>>
>>> I just think that trying to maintain API compatibility through major 
>>> releases is a bad idea. Leads to bloat, and complex code - both 
>>> internal and external.  In order to achieve great gains in usability 
>>> and/or performance in a mature product like Lucene almost certainly 
>>> requires massive changes to the processes, algorithms and 
>>> structures, and the API should change as well to reflect this.
>>>
>>> On Jan 22, 2008, at 2:30 PM, Chris Hostetter wrote:
>>>
>>>>
>>>> : If they are " no longer actively developing the portion of the 
>>>> code that's
>>>> : broken, aren't seeking the new feature, etc", and they stay back 
>>>> on old
>>>> : versions... isn't that exactly what we want? They can stay on the 
>>>> old version,
>>>> : and new application development uses the newer version.
>>>>
>>>> This basically mirrors a philosophy that is rising in the Perl
>>>> community evangelized by (a really smart dude named chromatic) ...
>>>> "why are we worry about the effect of upgrades on users who don't 
>>>> upgrade?"
>>>>
>>>> The problem is not all users are created equal and not all users 
>>>> upgrade
>>>> for the same reasons or at the same time...
>>>>
>>>> Group A: If someone is paranoid about upgrading, and is still running
>>>> lucene1.4.3 because they are afraid if they upgrade their app will 
>>>> break
>>>> and they don't want to deal with it; they don't care about known 
>>>> bugs in
>>>> lucene1.4.3, as long as those bugs haven't impacted them yet -- these
>>>> people aren't going to care wether we add a bunch of new methods to
>>>> interfaces, or remove a bunch of public methods from arbitrary 
>>>> releases,
>>>> because they are never going to see them.  They might do a total 
>>>> rewrite
>>>> of their project later, and they'll worry about it then (when they 
>>>> have
>>>> lots of time and QA resources)
>>>>
>>>> Group: B: At the other extreme, are the "free-spirited" developers 
>>>> (god i
>>>> hate that that the word "agile" has been co-opted) who are always 
>>>> eager to
>>>> upgrade to get the latest bells and whistles, and don't mind making
>>>> changes to code and recompiling everytime they upgrades -- just as 
>>>> long as
>>>> there are some decent docs on what to change.
>>>>
>>>> Croup: C: In the middle is a larg group of people who are 
>>>> interested in
>>>> upgrading, who want bug fixes, are willing to write new code to take
>>>> advantage of new features, in some cases are even willing to make
>>>> small or medium changes their code to get really good performance
>>>> improvements ... but they don't have a lot of time or energy to 
>>>> constantly
>>>> rewrite big chunks of their app.  For these people, knowing that 
>>>> they can
>>>> "drop in" the new version and it will work is a big reason why 
>>>> there are
>>>> willing to upgrade, and why they are willing to spend soem time
>>>> tweaking code to take advantage of the new features and the new
>>>> performacne enhaced APIs -- becuase they don't have to spend a lot 
>>>> of time
>>>> just to get the app working as well as it was before.
>>>>
>>>> To draw an analogy...
>>>>
>>>> Group A will stand in one place for a really long time no matter 
>>>> how easy
>>>> the path is.  Once in a great while they will decide to march forward
>>>> dozens of miles in one big push, but only once they feel they have
>>>> adequate resources to make the entire trip at once.
>>>>
>>>> Group B likes to frolic, and will happily take two sptens backward and
>>>> then 3 steps forward every day.
>>>>
>>>> Group C will walk forward with you at a steady pace, and 
>>>> occasionally even
>>>> take a step back before moving forward, but only if the path is 
>>>> clear and
>>>> not very steap.
>>>>
>>>> : I bet, if you did a poll of all Lucene users, you would find a 
>>>> majority of
>>>> : them still only run 1.4.3, or maybe 1.9. Even with 2.0, 2.3, or 
>>>> 3.0, that is
>>>> : still going to be the case.
>>>>
>>>> That's probably true, but a nice perk of our current backwards
>>>> compatibility commitments is that when people pop up asking questions
>>>> about 1.4.3, we can give them like "upgrading to 2.0.0 solves your
>>>> problem" and that advice isn't a death sentence -- the steps to move
>>>> forward are small and easy.
>>>>
>>>> I look at things the way things like Maven v1 vs v2 worked out, and 
>>>> how
>>>> that fractured the community for a long time (as far as i can tell 
>>>> it's
>>>> still pretty fractured) because the path from v1 to v2 was so steep 
>>>> and
>>>> involved backtracking so much and i worry that if we make changes 
>>>> to our
>>>> "copatibility pledge" that don't allow for an even forward walk, we'll
>>>> wind up with a heavily fractured community.
>>>>
>>>>
>>>>
>>>> -Hoss
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

Robert, besides LUCENE-1044 (syncing on commit), what is the Lucene
core missing in order for you (or, someone) to build XA compliance on
top of it?

Ie, you can open a writer with autoCommit=false and no changes are
committed until you close it.  You can abort the session by calling
writer.abort().  What's still missing, besides LUCENE-1044?

Mike

robert engels wrote:

> One more example on this. A lot of work was done on transaction  
> support. I would argue that this falls way short of what is needed,  
> since there is no XA transaction support. Since the lucene index  
> (unless stored in an XA db) is a separate resource, it really needs  
> XA support in order to be consistent with the other resources.
>
> All of the transaction work that has been performed only guarantees  
> that barring a physical hardware failure the lucene index can be  
> opened and used at a known state.  This index though is probably  
> not consistent with the other resources.
>
> All that was done is that we can now guarantee that the index is  
> consistent at SOME point in time.
>
> Given the work that was done, we are probably closer to adding XA  
> support, but I think this would be much easier if the concept of a  
> transaction was made first class through the API (and then XA  
> transactions need to be supported).
>
> On Jan 22, 2008, at 2:49 PM, robert engels wrote:
>
>> I don't think group C is interested in bug fixes. I just don't see  
>> how Lucene is at all useful if the users are encountering any bug  
>> - so they either don't use that feature, or they have already  
>> developed a work-around (or they have patched the code in a way  
>> that avoids the bug, yet is specific to their environment).
>>
>> For example, I think the NFS work (bugs, fixes, etc.) was quite  
>> substantial. I think the actual number of people trying to use NFS  
>> is probably very low - as the initial implementation had so many  
>> problems (and IMO is not a very good solution for distributed  
>> indexes anyway). So all the work in trying to make NFS work  
>> "correctly" behind the scenes may have been inefficient, since a  
>> more direct, yet major fix may have solved the problem better  
>> (like distributed server support, not shared index access).
>>
>> I just think that trying to maintain API compatibility through  
>> major releases is a bad idea. Leads to bloat, and complex code -  
>> both internal and external.  In order to achieve great gains in  
>> usability and/or performance in a mature product like Lucene  
>> almost certainly requires massive changes to the processes,  
>> algorithms and structures, and the API should change as well to  
>> reflect this.
>>
>> On Jan 22, 2008, at 2:30 PM, Chris Hostetter wrote:
>>
>>>
>>> : If they are " no longer actively developing the portion of the  
>>> code that's
>>> : broken, aren't seeking the new feature, etc", and they stay  
>>> back on old
>>> : versions... isn't that exactly what we want? They can stay on  
>>> the old version,
>>> : and new application development uses the newer version.
>>>
>>> This basically mirrors a philosophy that is rising in the Perl
>>> community evangelized by (a really smart dude named chromatic) ...
>>> "why are we worry about the effect of upgrades on users who don't  
>>> upgrade?"
>>>
>>> The problem is not all users are created equal and not all users  
>>> upgrade
>>> for the same reasons or at the same time...
>>>
>>> Group A: If someone is paranoid about upgrading, and is still  
>>> running
>>> lucene1.4.3 because they are afraid if they upgrade their app  
>>> will break
>>> and they don't want to deal with it; they don't care about known  
>>> bugs in
>>> lucene1.4.3, as long as those bugs haven't impacted them yet --  
>>> these
>>> people aren't going to care wether we add a bunch of new methods to
>>> interfaces, or remove a bunch of public methods from arbitrary  
>>> releases,
>>> because they are never going to see them.  They might do a total  
>>> rewrite
>>> of their project later, and they'll worry about it then (when  
>>> they have
>>> lots of time and QA resources)
>>>
>>> Group: B: At the other extreme, are the "free-spirited"  
>>> developers (god i
>>> hate that that the word "agile" has been co-opted) who are always  
>>> eager to
>>> upgrade to get the latest bells and whistles, and don't mind making
>>> changes to code and recompiling everytime they upgrades -- just  
>>> as long as
>>> there are some decent docs on what to change.
>>>
>>> Croup: C: In the middle is a larg group of people who are  
>>> interested in
>>> upgrading, who want bug fixes, are willing to write new code to take
>>> advantage of new features, in some cases are even willing to make
>>> small or medium changes their code to get really good performance
>>> improvements ... but they don't have a lot of time or energy to  
>>> constantly
>>> rewrite big chunks of their app.  For these people, knowing that  
>>> they can
>>> "drop in" the new version and it will work is a big reason why  
>>> there are
>>> willing to upgrade, and why they are willing to spend soem time
>>> tweaking code to take advantage of the new features and the new
>>> performacne enhaced APIs -- becuase they don't have to spend a  
>>> lot of time
>>> just to get the app working as well as it was before.
>>>
>>> To draw an analogy...
>>>
>>> Group A will stand in one place for a really long time no matter  
>>> how easy
>>> the path is.  Once in a great while they will decide to march  
>>> forward
>>> dozens of miles in one big push, but only once they feel they have
>>> adequate resources to make the entire trip at once.
>>>
>>> Group B likes to frolic, and will happily take two sptens  
>>> backward and
>>> then 3 steps forward every day.
>>>
>>> Group C will walk forward with you at a steady pace, and  
>>> occasionally even
>>> take a step back before moving forward, but only if the path is  
>>> clear and
>>> not very steap.
>>>
>>> : I bet, if you did a poll of all Lucene users, you would find a  
>>> majority of
>>> : them still only run 1.4.3, or maybe 1.9. Even with 2.0, 2.3, or  
>>> 3.0, that is
>>> : still going to be the case.
>>>
>>> That's probably true, but a nice perk of our current backwards
>>> compatibility commitments is that when people pop up asking  
>>> questions
>>> about 1.4.3, we can give them like "upgrading to 2.0.0 solves your
>>> problem" and that advice isn't a death sentence -- the steps to move
>>> forward are small and easy.
>>>
>>> I look at things the way things like Maven v1 vs v2 worked out,  
>>> and how
>>> that fractured the community for a long time (as far as i can  
>>> tell it's
>>> still pretty fractured) because the path from v1 to v2 was so  
>>> steep and
>>> involved backtracking so much and i worry that if we make changes  
>>> to our
>>> "copatibility pledge" that don't allow for an even forward walk,  
>>> we'll
>>> wind up with a heavily fractured community.
>>>
>>>
>>>
>>> -Hoss
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by robert engels <re...@ix.netcom.com>.

A specific example:

You have a criminal justice system that indexes past court cases.

You do a search for cases involving Joe Smith because you are a judge  
and you want to review priors before sentencing. Similar issues with  
related cases, case history, etc.

Is it better to return something that may not be correct, or return  
an error saying the index is offline and is being rebuilt - please  
perform your search later?  In this case old false positives are just  
as bad as missing new records. I hope that demonstrates the position  
clearly.

As I stated, there are several classes of applications where "any  
data" whether it is current or valid is acceptable, but I would argue  
that in MOST cases this is not the case, and if the interested  
subjects fully reviewed their requirements they would not accept that  
solution. It is easily summarized with the old adage "garbage in,  
garbage out".

The only reason that corruption is ok is that you need to reindex  
anyway, and rebuilding from scratch is often faster than determining  
the affected documents and updating (especially if corruption is a  
possibility).

It was in fact me that brought about the issue that none of the  
"lockless commits" code fixed anything related to corruption.  The  
only way to ensure non-corruption is to sync all data files, then  
write and sync the segments file.  I think this change could have  
been accomplished in about 10 lines of code, and is completely  
independent of lockless commits, and in most cases makes lockless  
commits obsolete.  But to be honest, I am not really certain how  
lockless commits can actually work in an environment that allows  
updates to the documents (and or related resources), so I am sure  
there are aspects I am just ignorant of.

As an aside, we engineered our software years ago to work around  
these issues, which why we still use a 1.9 derivative, and monitor  
the trunk for important fixes an enhancements.

On Jan 22, 2008, at 8:35 PM, Mark Miller wrote:

>
>
> robert engels wrote:
>> I think there are a lot of applications using Lucene where  
>> "whether its lost a bit of data or not" is not acceptable.
> Yeah, and I have one of them. Which is why I would love the support  
> your talking about. But its not there yet and I am just grateful  
> that i can get my customers back up and searching as quick as  
> possible rather than experience an index corruption. Access to the  
> data is more important than complete access to the data for my  
> customers (though theyd say they certainly want both). After such  
> an experience I have to run through the database and check if  
> anything from the index is missing, and if it is, re index. Not  
> ideal, but what can you do? I find it odd that you don't think non  
> corruption is better than nothing. Its a big feature for me. If the  
> server reboots at night and causes a corruption, I have customers  
> that will be SOL for some time...id prefer when the server reboots,  
> my index - whatever is left, is searchable. My customers need to  
> work. Can't get behind on a daily product :)
>
> I'd prefer what your talking about, but there are tons of other  
> things I'd love to see in Lucene as well. It just seems odd to  
> complain about them. I'd think that instead, I might spear head the  
> development. Just not experienced enough myself to do a lot of the  
> deeper work. You don't appear so limited. How about helping out  
> with some transactional support :)
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Mark Miller <ma...@gmail.com>.


robert engels wrote:
> I think there are a lot of applications using Lucene where "whether 
> its lost a bit of data or not" is not acceptable.
Yeah, and I have one of them. Which is why I would love the support your 
talking about. But its not there yet and I am just grateful that i can 
get my customers back up and searching as quick as possible rather than 
experience an index corruption. Access to the data is more important 
than complete access to the data for my customers (though theyd say they 
certainly want both). After such an experience I have to run through the 
database and check if anything from the index is missing, and if it is, 
re index. Not ideal, but what can you do? I find it odd that you don't 
think non corruption is better than nothing. Its a big feature for me. 
If the server reboots at night and causes a corruption, I have customers 
that will be SOL for some time...id prefer when the server reboots, my 
index - whatever is left, is searchable. My customers need to work. 
Can't get behind on a daily product :)

I'd prefer what your talking about, but there are tons of other things 
I'd love to see in Lucene as well. It just seems odd to complain about 
them. I'd think that instead, I might spear head the development. Just 
not experienced enough myself to do a lot of the deeper work. You don't 
appear so limited. How about helping out with some transactional support :)


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by robert engels <re...@ix.netcom.com>.

I think there are a lot of applications using Lucene where "whether  
its lost a bit of data or not" is not acceptable.

However, it is probably fine for a web search, or intranet search.

As to your first point, that is why the really great open-source  
projects (eclipse, open office) have a financial backer that provides  
significant direction, and contributions.  They wouldn't waste their  
resources developing esoteric features with little appeal, and direct  
their resources to broader features that others can then develop  
finer features on top off.

I don't question the abilities of Michael whatsoever - I just wish  
they were directed at broader features. The review by the voters (and  
this list) allows development to focused.

Frequently perfectly correct patches are rejected by the voters. Why?  
Because SOMEONE needs to keep the development focused - if not there  
will be chaos.

On Jan 22, 2008, at 4:19 PM, Mark Miller wrote:

> I humbly disagree about NFS. Arguing about where free time was  
> invested, or wasted, or inefficient, in an open source project just  
> seems silly. One of the great benefits is esoteric work that would  
> normally not be allowed for. NFS is easy. A lot of Lucene users  
> don't care about Lucene. They just want something easy to setup. It  
> especially doesn't make send when talking about Michael. He seems  
> to spit out Lucene code in his sleep. I doubt NFS stuff did  
> anything but to make him more brilliant at manipulating Lucene. It  
> certainly hasn't made him any less prolific.
>
> I am very in favor of your talk about transactional support. Man do  
> I want Lucene to have that. But the fact that we are getting to  
> where the index cannot be corrupted is still a great step forward.  
> Knowing that my indexes will not be corrupted while running at a  
> place that needs access 24/7 is just wonderful. I can get something  
> working for them quick, whether its lost a bit of data or not. Now  
> full support to guarantee that my Lucene index is consistent with  
> my Database? Even better. I wish. But I am still very thankful for  
> the first step of a guaranteed consistent index.
>
> Your glass is always half full ;) I aspire to your crankiness when  
> I get older.
>
> - Mark
>
>
> robert engels wrote:
>> One more example on this. A lot of work was done on transaction  
>> support. I would argue that this falls way short of what is  
>> needed, since there is no XA transaction support. Since the lucene  
>> index (unless stored in an XA db) is a separate resource, it  
>> really needs XA support in order to be consistent with the other  
>> resources.
>>
>> All of the transaction work that has been performed only  
>> guarantees that barring a physical hardware failure the lucene  
>> index can be opened and used at a known state.  This index though  
>> is probably not consistent with the other resources.
>>
>> All that was done is that we can now guarantee that the index is  
>> consistent at SOME point in time.
>>
>> Given the work that was done, we are probably closer to adding XA  
>> support, but I think this would be much easier if the concept of a  
>> transaction was made first class through the API (and then XA  
>> transactions need to be supported).
>>
>> On Jan 22, 2008, at 2:49 PM, robert engels wrote:
>>
>>> I don't think group C is interested in bug fixes. I just don't  
>>> see how Lucene is at all useful if the users are encountering any  
>>> bug - so they either don't use that feature, or they have already  
>>> developed a work-around (or they have patched the code in a way  
>>> that avoids the bug, yet is specific to their environment).
>>>
>>> For example, I think the NFS work (bugs, fixes, etc.) was quite  
>>> substantial. I think the actual number of people trying to use  
>>> NFS is probably very low - as the initial implementation had so  
>>> many problems (and IMO is not a very good solution for  
>>> distributed indexes anyway). So all the work in trying to make  
>>> NFS work "correctly" behind the scenes may have been inefficient,  
>>> since a more direct, yet major fix may have solved the problem  
>>> better (like distributed server support, not shared index access).
>>>
>>> I just think that trying to maintain API compatibility through  
>>> major releases is a bad idea. Leads to bloat, and complex code -  
>>> both internal and external.  In order to achieve great gains in  
>>> usability and/or performance in a mature product like Lucene  
>>> almost certainly requires massive changes to the processes,  
>>> algorithms and structures, and the API should change as well to  
>>> reflect this.
>>>
>>> On Jan 22, 2008, at 2:30 PM, Chris Hostetter wrote:
>>>
>>>>
>>>> : If they are " no longer actively developing the portion of the  
>>>> code that's
>>>> : broken, aren't seeking the new feature, etc", and they stay  
>>>> back on old
>>>> : versions... isn't that exactly what we want? They can stay on  
>>>> the old version,
>>>> : and new application development uses the newer version.
>>>>
>>>> This basically mirrors a philosophy that is rising in the Perl
>>>> community evangelized by (a really smart dude named chromatic) ...
>>>> "why are we worry about the effect of upgrades on users who  
>>>> don't upgrade?"
>>>>
>>>> The problem is not all users are created equal and not all users  
>>>> upgrade
>>>> for the same reasons or at the same time...
>>>>
>>>> Group A: If someone is paranoid about upgrading, and is still  
>>>> running
>>>> lucene1.4.3 because they are afraid if they upgrade their app  
>>>> will break
>>>> and they don't want to deal with it; they don't care about known  
>>>> bugs in
>>>> lucene1.4.3, as long as those bugs haven't impacted them yet --  
>>>> these
>>>> people aren't going to care wether we add a bunch of new methods to
>>>> interfaces, or remove a bunch of public methods from arbitrary  
>>>> releases,
>>>> because they are never going to see them.  They might do a total  
>>>> rewrite
>>>> of their project later, and they'll worry about it then (when  
>>>> they have
>>>> lots of time and QA resources)
>>>>
>>>> Group: B: At the other extreme, are the "free-spirited"  
>>>> developers (god i
>>>> hate that that the word "agile" has been co-opted) who are  
>>>> always eager to
>>>> upgrade to get the latest bells and whistles, and don't mind making
>>>> changes to code and recompiling everytime they upgrades -- just  
>>>> as long as
>>>> there are some decent docs on what to change.
>>>>
>>>> Croup: C: In the middle is a larg group of people who are  
>>>> interested in
>>>> upgrading, who want bug fixes, are willing to write new code to  
>>>> take
>>>> advantage of new features, in some cases are even willing to make
>>>> small or medium changes their code to get really good performance
>>>> improvements ... but they don't have a lot of time or energy to  
>>>> constantly
>>>> rewrite big chunks of their app.  For these people, knowing that  
>>>> they can
>>>> "drop in" the new version and it will work is a big reason why  
>>>> there are
>>>> willing to upgrade, and why they are willing to spend soem time
>>>> tweaking code to take advantage of the new features and the new
>>>> performacne enhaced APIs -- becuase they don't have to spend a  
>>>> lot of time
>>>> just to get the app working as well as it was before.
>>>>
>>>> To draw an analogy...
>>>>
>>>> Group A will stand in one place for a really long time no matter  
>>>> how easy
>>>> the path is.  Once in a great while they will decide to march  
>>>> forward
>>>> dozens of miles in one big push, but only once they feel they have
>>>> adequate resources to make the entire trip at once.
>>>>
>>>> Group B likes to frolic, and will happily take two sptens  
>>>> backward and
>>>> then 3 steps forward every day.
>>>>
>>>> Group C will walk forward with you at a steady pace, and  
>>>> occasionally even
>>>> take a step back before moving forward, but only if the path is  
>>>> clear and
>>>> not very steap.
>>>>
>>>> : I bet, if you did a poll of all Lucene users, you would find a  
>>>> majority of
>>>> : them still only run 1.4.3, or maybe 1.9. Even with 2.0, 2.3,  
>>>> or 3.0, that is
>>>> : still going to be the case.
>>>>
>>>> That's probably true, but a nice perk of our current backwards
>>>> compatibility commitments is that when people pop up asking  
>>>> questions
>>>> about 1.4.3, we can give them like "upgrading to 2.0.0 solves your
>>>> problem" and that advice isn't a death sentence -- the steps to  
>>>> move
>>>> forward are small and easy.
>>>>
>>>> I look at things the way things like Maven v1 vs v2 worked out,  
>>>> and how
>>>> that fractured the community for a long time (as far as i can  
>>>> tell it's
>>>> still pretty fractured) because the path from v1 to v2 was so  
>>>> steep and
>>>> involved backtracking so much and i worry that if we make  
>>>> changes to our
>>>> "copatibility pledge" that don't allow for an even forward walk,  
>>>> we'll
>>>> wind up with a heavily fractured community.
>>>>
>>>>
>>>>
>>>> -Hoss
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Mark Miller <ma...@gmail.com>.

I humbly disagree about NFS. Arguing about where free time was invested, 
or wasted, or inefficient, in an open source project just seems silly. 
One of the great benefits is esoteric work that would normally not be 
allowed for. NFS is easy. A lot of Lucene users don't care about Lucene. 
They just want something easy to setup. It especially doesn't make send 
when talking about Michael. He seems to spit out Lucene code in his 
sleep. I doubt NFS stuff did anything but to make him more brilliant at 
manipulating Lucene. It certainly hasn't made him any less prolific.

I am very in favor of your talk about transactional support. Man do I 
want Lucene to have that. But the fact that we are getting to where the 
index cannot be corrupted is still a great step forward. Knowing that my 
indexes will not be corrupted while running at a place that needs access 
24/7 is just wonderful. I can get something working for them quick, 
whether its lost a bit of data or not. Now full support to guarantee 
that my Lucene index is consistent with my Database? Even better. I 
wish. But I am still very thankful for the first step of a guaranteed 
consistent index.

Your glass is always half full ;) I aspire to your crankiness when I get 
older.

- Mark


robert engels wrote:
> One more example on this. A lot of work was done on transaction 
> support. I would argue that this falls way short of what is needed, 
> since there is no XA transaction support. Since the lucene index 
> (unless stored in an XA db) is a separate resource, it really needs XA 
> support in order to be consistent with the other resources.
>
> All of the transaction work that has been performed only guarantees 
> that barring a physical hardware failure the lucene index can be 
> opened and used at a known state.  This index though is probably not 
> consistent with the other resources.
>
> All that was done is that we can now guarantee that the index is 
> consistent at SOME point in time.
>
> Given the work that was done, we are probably closer to adding XA 
> support, but I think this would be much easier if the concept of a 
> transaction was made first class through the API (and then XA 
> transactions need to be supported).
>
> On Jan 22, 2008, at 2:49 PM, robert engels wrote:
>
>> I don't think group C is interested in bug fixes. I just don't see 
>> how Lucene is at all useful if the users are encountering any bug - 
>> so they either don't use that feature, or they have already developed 
>> a work-around (or they have patched the code in a way that avoids the 
>> bug, yet is specific to their environment).
>>
>> For example, I think the NFS work (bugs, fixes, etc.) was quite 
>> substantial. I think the actual number of people trying to use NFS is 
>> probably very low - as the initial implementation had so many 
>> problems (and IMO is not a very good solution for distributed indexes 
>> anyway). So all the work in trying to make NFS work "correctly" 
>> behind the scenes may have been inefficient, since a more direct, yet 
>> major fix may have solved the problem better (like distributed server 
>> support, not shared index access).
>>
>> I just think that trying to maintain API compatibility through major 
>> releases is a bad idea. Leads to bloat, and complex code - both 
>> internal and external.  In order to achieve great gains in usability 
>> and/or performance in a mature product like Lucene almost certainly 
>> requires massive changes to the processes, algorithms and structures, 
>> and the API should change as well to reflect this.
>>
>> On Jan 22, 2008, at 2:30 PM, Chris Hostetter wrote:
>>
>>>
>>> : If they are " no longer actively developing the portion of the 
>>> code that's
>>> : broken, aren't seeking the new feature, etc", and they stay back 
>>> on old
>>> : versions... isn't that exactly what we want? They can stay on the 
>>> old version,
>>> : and new application development uses the newer version.
>>>
>>> This basically mirrors a philosophy that is rising in the Perl
>>> community evangelized by (a really smart dude named chromatic) ...
>>> "why are we worry about the effect of upgrades on users who don't 
>>> upgrade?"
>>>
>>> The problem is not all users are created equal and not all users 
>>> upgrade
>>> for the same reasons or at the same time...
>>>
>>> Group A: If someone is paranoid about upgrading, and is still running
>>> lucene1.4.3 because they are afraid if they upgrade their app will 
>>> break
>>> and they don't want to deal with it; they don't care about known 
>>> bugs in
>>> lucene1.4.3, as long as those bugs haven't impacted them yet -- these
>>> people aren't going to care wether we add a bunch of new methods to
>>> interfaces, or remove a bunch of public methods from arbitrary 
>>> releases,
>>> because they are never going to see them.  They might do a total 
>>> rewrite
>>> of their project later, and they'll worry about it then (when they have
>>> lots of time and QA resources)
>>>
>>> Group: B: At the other extreme, are the "free-spirited" developers 
>>> (god i
>>> hate that that the word "agile" has been co-opted) who are always 
>>> eager to
>>> upgrade to get the latest bells and whistles, and don't mind making
>>> changes to code and recompiling everytime they upgrades -- just as 
>>> long as
>>> there are some decent docs on what to change.
>>>
>>> Croup: C: In the middle is a larg group of people who are interested in
>>> upgrading, who want bug fixes, are willing to write new code to take
>>> advantage of new features, in some cases are even willing to make
>>> small or medium changes their code to get really good performance
>>> improvements ... but they don't have a lot of time or energy to 
>>> constantly
>>> rewrite big chunks of their app.  For these people, knowing that 
>>> they can
>>> "drop in" the new version and it will work is a big reason why there 
>>> are
>>> willing to upgrade, and why they are willing to spend soem time
>>> tweaking code to take advantage of the new features and the new
>>> performacne enhaced APIs -- becuase they don't have to spend a lot 
>>> of time
>>> just to get the app working as well as it was before.
>>>
>>> To draw an analogy...
>>>
>>> Group A will stand in one place for a really long time no matter how 
>>> easy
>>> the path is.  Once in a great while they will decide to march forward
>>> dozens of miles in one big push, but only once they feel they have
>>> adequate resources to make the entire trip at once.
>>>
>>> Group B likes to frolic, and will happily take two sptens backward and
>>> then 3 steps forward every day.
>>>
>>> Group C will walk forward with you at a steady pace, and 
>>> occasionally even
>>> take a step back before moving forward, but only if the path is 
>>> clear and
>>> not very steap.
>>>
>>> : I bet, if you did a poll of all Lucene users, you would find a 
>>> majority of
>>> : them still only run 1.4.3, or maybe 1.9. Even with 2.0, 2.3, or 
>>> 3.0, that is
>>> : still going to be the case.
>>>
>>> That's probably true, but a nice perk of our current backwards
>>> compatibility commitments is that when people pop up asking questions
>>> about 1.4.3, we can give them like "upgrading to 2.0.0 solves your
>>> problem" and that advice isn't a death sentence -- the steps to move
>>> forward are small and easy.
>>>
>>> I look at things the way things like Maven v1 vs v2 worked out, and how
>>> that fractured the community for a long time (as far as i can tell it's
>>> still pretty fractured) because the path from v1 to v2 was so steep and
>>> involved backtracking so much and i worry that if we make changes to 
>>> our
>>> "copatibility pledge" that don't allow for an even forward walk, we'll
>>> wind up with a heavily fractured community.
>>>
>>>
>>>
>>> -Hoss
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by robert engels <re...@ix.netcom.com>.

One more example on this. A lot of work was done on transaction  
support. I would argue that this falls way short of what is needed,  
since there is no XA transaction support. Since the lucene index  
(unless stored in an XA db) is a separate resource, it really needs  
XA support in order to be consistent with the other resources.

All of the transaction work that has been performed only guarantees  
that barring a physical hardware failure the lucene index can be  
opened and used at a known state.  This index though is probably not  
consistent with the other resources.

All that was done is that we can now guarantee that the index is  
consistent at SOME point in time.

Given the work that was done, we are probably closer to adding XA  
support, but I think this would be much easier if the concept of a  
transaction was made first class through the API (and then XA  
transactions need to be supported).

On Jan 22, 2008, at 2:49 PM, robert engels wrote:

> I don't think group C is interested in bug fixes. I just don't see  
> how Lucene is at all useful if the users are encountering any bug -  
> so they either don't use that feature, or they have already  
> developed a work-around (or they have patched the code in a way  
> that avoids the bug, yet is specific to their environment).
>
> For example, I think the NFS work (bugs, fixes, etc.) was quite  
> substantial. I think the actual number of people trying to use NFS  
> is probably very low - as the initial implementation had so many  
> problems (and IMO is not a very good solution for distributed  
> indexes anyway). So all the work in trying to make NFS work  
> "correctly" behind the scenes may have been inefficient, since a  
> more direct, yet major fix may have solved the problem better (like  
> distributed server support, not shared index access).
>
> I just think that trying to maintain API compatibility through  
> major releases is a bad idea. Leads to bloat, and complex code -  
> both internal and external.  In order to achieve great gains in  
> usability and/or performance in a mature product like Lucene almost  
> certainly requires massive changes to the processes, algorithms and  
> structures, and the API should change as well to reflect this.
>
> On Jan 22, 2008, at 2:30 PM, Chris Hostetter wrote:
>
>>
>> : If they are " no longer actively developing the portion of the  
>> code that's
>> : broken, aren't seeking the new feature, etc", and they stay back  
>> on old
>> : versions... isn't that exactly what we want? They can stay on  
>> the old version,
>> : and new application development uses the newer version.
>>
>> This basically mirrors a philosophy that is rising in the Perl
>> community evangelized by (a really smart dude named chromatic) ...
>> "why are we worry about the effect of upgrades on users who don't  
>> upgrade?"
>>
>> The problem is not all users are created equal and not all users  
>> upgrade
>> for the same reasons or at the same time...
>>
>> Group A: If someone is paranoid about upgrading, and is still running
>> lucene1.4.3 because they are afraid if they upgrade their app will  
>> break
>> and they don't want to deal with it; they don't care about known  
>> bugs in
>> lucene1.4.3, as long as those bugs haven't impacted them yet -- these
>> people aren't going to care wether we add a bunch of new methods to
>> interfaces, or remove a bunch of public methods from arbitrary  
>> releases,
>> because they are never going to see them.  They might do a total  
>> rewrite
>> of their project later, and they'll worry about it then (when they  
>> have
>> lots of time and QA resources)
>>
>> Group: B: At the other extreme, are the "free-spirited" developers  
>> (god i
>> hate that that the word "agile" has been co-opted) who are always  
>> eager to
>> upgrade to get the latest bells and whistles, and don't mind making
>> changes to code and recompiling everytime they upgrades -- just as  
>> long as
>> there are some decent docs on what to change.
>>
>> Croup: C: In the middle is a larg group of people who are  
>> interested in
>> upgrading, who want bug fixes, are willing to write new code to take
>> advantage of new features, in some cases are even willing to make
>> small or medium changes their code to get really good performance
>> improvements ... but they don't have a lot of time or energy to  
>> constantly
>> rewrite big chunks of their app.  For these people, knowing that  
>> they can
>> "drop in" the new version and it will work is a big reason why  
>> there are
>> willing to upgrade, and why they are willing to spend soem time
>> tweaking code to take advantage of the new features and the new
>> performacne enhaced APIs -- becuase they don't have to spend a lot  
>> of time
>> just to get the app working as well as it was before.
>>
>> To draw an analogy...
>>
>> Group A will stand in one place for a really long time no matter  
>> how easy
>> the path is.  Once in a great while they will decide to march forward
>> dozens of miles in one big push, but only once they feel they have
>> adequate resources to make the entire trip at once.
>>
>> Group B likes to frolic, and will happily take two sptens backward  
>> and
>> then 3 steps forward every day.
>>
>> Group C will walk forward with you at a steady pace, and  
>> occasionally even
>> take a step back before moving forward, but only if the path is  
>> clear and
>> not very steap.
>>
>> : I bet, if you did a poll of all Lucene users, you would find a  
>> majority of
>> : them still only run 1.4.3, or maybe 1.9. Even with 2.0, 2.3, or  
>> 3.0, that is
>> : still going to be the case.
>>
>> That's probably true, but a nice perk of our current backwards
>> compatibility commitments is that when people pop up asking questions
>> about 1.4.3, we can give them like "upgrading to 2.0.0 solves your
>> problem" and that advice isn't a death sentence -- the steps to move
>> forward are small and easy.
>>
>> I look at things the way things like Maven v1 vs v2 worked out,  
>> and how
>> that fractured the community for a long time (as far as i can tell  
>> it's
>> still pretty fractured) because the path from v1 to v2 was so  
>> steep and
>> involved backtracking so much and i worry that if we make changes  
>> to our
>> "copatibility pledge" that don't allow for an even forward walk,  
>> we'll
>> wind up with a heavily fractured community.
>>
>>
>>
>> -Hoss
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by robert engels <re...@ix.netcom.com>.

I don't think group C is interested in bug fixes. I just don't see  
how Lucene is at all useful if the users are encountering any bug -  
so they either don't use that feature, or they have already developed  
a work-around (or they have patched the code in a way that avoids the  
bug, yet is specific to their environment).

For example, I think the NFS work (bugs, fixes, etc.) was quite  
substantial. I think the actual number of people trying to use NFS is  
probably very low - as the initial implementation had so many  
problems (and IMO is not a very good solution for distributed indexes  
anyway). So all the work in trying to make NFS work "correctly"  
behind the scenes may have been inefficient, since a more direct, yet  
major fix may have solved the problem better (like distributed server  
support, not shared index access).

I just think that trying to maintain API compatibility through major  
releases is a bad idea. Leads to bloat, and complex code - both  
internal and external.  In order to achieve great gains in usability  
and/or performance in a mature product like Lucene almost certainly  
requires massive changes to the processes, algorithms and structures,  
and the API should change as well to reflect this.

On Jan 22, 2008, at 2:30 PM, Chris Hostetter wrote:

>
> : If they are " no longer actively developing the portion of the  
> code that's
> : broken, aren't seeking the new feature, etc", and they stay back  
> on old
> : versions... isn't that exactly what we want? They can stay on the  
> old version,
> : and new application development uses the newer version.
>
> This basically mirrors a philosophy that is rising in the Perl
> community evangelized by (a really smart dude named chromatic) ...
> "why are we worry about the effect of upgrades on users who don't  
> upgrade?"
>
> The problem is not all users are created equal and not all users  
> upgrade
> for the same reasons or at the same time...
>
> Group A: If someone is paranoid about upgrading, and is still running
> lucene1.4.3 because they are afraid if they upgrade their app will  
> break
> and they don't want to deal with it; they don't care about known  
> bugs in
> lucene1.4.3, as long as those bugs haven't impacted them yet -- these
> people aren't going to care wether we add a bunch of new methods to
> interfaces, or remove a bunch of public methods from arbitrary  
> releases,
> because they are never going to see them.  They might do a total  
> rewrite
> of their project later, and they'll worry about it then (when they  
> have
> lots of time and QA resources)
>
> Group: B: At the other extreme, are the "free-spirited" developers  
> (god i
> hate that that the word "agile" has been co-opted) who are always  
> eager to
> upgrade to get the latest bells and whistles, and don't mind making
> changes to code and recompiling everytime they upgrades -- just as  
> long as
> there are some decent docs on what to change.
>
> Croup: C: In the middle is a larg group of people who are  
> interested in
> upgrading, who want bug fixes, are willing to write new code to take
> advantage of new features, in some cases are even willing to make
> small or medium changes their code to get really good performance
> improvements ... but they don't have a lot of time or energy to  
> constantly
> rewrite big chunks of their app.  For these people, knowing that  
> they can
> "drop in" the new version and it will work is a big reason why  
> there are
> willing to upgrade, and why they are willing to spend soem time
> tweaking code to take advantage of the new features and the new
> performacne enhaced APIs -- becuase they don't have to spend a lot  
> of time
> just to get the app working as well as it was before.
>
> To draw an analogy...
>
> Group A will stand in one place for a really long time no matter  
> how easy
> the path is.  Once in a great while they will decide to march forward
> dozens of miles in one big push, but only once they feel they have
> adequate resources to make the entire trip at once.
>
> Group B likes to frolic, and will happily take two sptens backward and
> then 3 steps forward every day.
>
> Group C will walk forward with you at a steady pace, and  
> occasionally even
> take a step back before moving forward, but only if the path is  
> clear and
> not very steap.
>
> : I bet, if you did a poll of all Lucene users, you would find a  
> majority of
> : them still only run 1.4.3, or maybe 1.9. Even with 2.0, 2.3, or  
> 3.0, that is
> : still going to be the case.
>
> That's probably true, but a nice perk of our current backwards
> compatibility commitments is that when people pop up asking questions
> about 1.4.3, we can give them like "upgrading to 2.0.0 solves your
> problem" and that advice isn't a death sentence -- the steps to move
> forward are small and easy.
>
> I look at things the way things like Maven v1 vs v2 worked out, and  
> how
> that fractured the community for a long time (as far as i can tell  
> it's
> still pretty fractured) because the path from v1 to v2 was so steep  
> and
> involved backtracking so much and i worry that if we make changes  
> to our
> "copatibility pledge" that don't allow for an even forward walk, we'll
> wind up with a heavily fractured community.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Chris Hostetter <ho...@fucit.org>.

: If they are " no longer actively developing the portion of the code that's
: broken, aren't seeking the new feature, etc", and they stay back on old
: versions... isn't that exactly what we want? They can stay on the old version,
: and new application development uses the newer version.

This basically mirrors a philosophy that is rising in the Perl 
community evangelized by (a really smart dude named chromatic) ...
"why are we worry about the effect of upgrades on users who don't upgrade?"

The problem is not all users are created equal and not all users upgrade 
for the same reasons or at the same time...

Group A: If someone is paranoid about upgrading, and is still running 
lucene1.4.3 because they are afraid if they upgrade their app will break 
and they don't want to deal with it; they don't care about known bugs in 
lucene1.4.3, as long as those bugs haven't impacted them yet -- these 
people aren't going to care wether we add a bunch of new methods to 
interfaces, or remove a bunch of public methods from arbitrary releases, 
because they are never going to see them.  They might do a total rewrite 
of their project later, and they'll worry about it then (when they have 
lots of time and QA resources)

Group: B: At the other extreme, are the "free-spirited" developers (god i 
hate that that the word "agile" has been co-opted) who are always eager to 
upgrade to get the latest bells and whistles, and don't mind making 
changes to code and recompiling everytime they upgrades -- just as long as 
there are some decent docs on what to change.

Croup: C: In the middle is a larg group of people who are interested in 
upgrading, who want bug fixes, are willing to write new code to take 
advantage of new features, in some cases are even willing to make 
small or medium changes their code to get really good performance 
improvements ... but they don't have a lot of time or energy to constantly 
rewrite big chunks of their app.  For these people, knowing that they can 
"drop in" the new version and it will work is a big reason why there are 
willing to upgrade, and why they are willing to spend soem time 
tweaking code to take advantage of the new features and the new 
performacne enhaced APIs -- becuase they don't have to spend a lot of time 
just to get the app working as well as it was before.

To draw an analogy...

Group A will stand in one place for a really long time no matter how easy 
the path is.  Once in a great while they will decide to march forward 
dozens of miles in one big push, but only once they feel they have 
adequate resources to make the entire trip at once.

Group B likes to frolic, and will happily take two sptens backward and 
then 3 steps forward every day.

Group C will walk forward with you at a steady pace, and occasionally even 
take a step back before moving forward, but only if the path is clear and 
not very steap.

: I bet, if you did a poll of all Lucene users, you would find a majority of
: them still only run 1.4.3, or maybe 1.9. Even with 2.0, 2.3, or 3.0, that is
: still going to be the case.

That's probably true, but a nice perk of our current backwards 
compatibility commitments is that when people pop up asking questions 
about 1.4.3, we can give them like "upgrading to 2.0.0 solves your 
problem" and that advice isn't a death sentence -- the steps to move 
forward are small and easy.

I look at things the way things like Maven v1 vs v2 worked out, and how 
that fractured the community for a long time (as far as i can tell it's 
still pretty fractured) because the path from v1 to v2 was so steep and 
involved backtracking so much and i worry that if we make changes to our 
"copatibility pledge" that don't allow for an even forward walk, we'll 
wind up with a heavily fractured community.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Grant Ingersoll <gs...@apache.org>.

On Jan 17, 2008, at 9:30 PM, DM Smith wrote:

>
> On Jan 17, 2008, at 7:57 PM, robert engels wrote:
>
>> If they are " no longer actively developing the portion of the code  
>> that's broken, aren't seeking the new feature, etc", and they stay  
>> back on old versions... isn't that exactly what we want? They can  
>> stay on the old version, and new application development uses the  
>> newer version.
>>
>> It would be different if it was a core JRE interface or similar -  
>> this is an optional jar.
>>
>> Part of what always made Windows so fragile is that as it evolved  
>> they tried to maintain backward compatibility - making working with  
>> the old/new code and fixing bugs almost impossible. The bloat  
>> became impossible to deal with.
>>
>> I bet, if you did a poll of all Lucene users, you would find a  
>> majority of them still only run 1.4.3, or maybe 1.9. Even with 2.0,  
>> 2.3, or 3.0, that is still going to be the case.
>
> I found that upgrading from 1.4.3 (our first version) to 1.9, to  
> 2.0, ... 2.2 and even 2.3 rc was painless.
>
> The deprecations in 1.9 gave clear guidance on how to do the  
> upgrade. Very easy to do. And with Lucene's robust test suite, I had  
> great confidence that it would work without much testing.

And I don't think this would change at all with what I am proposing.   
We still would be giving clear guidance, we would just be saying it's  
going to happen in 1 year, not 2 to 6.

>
>
> Going forward was simply a matter of dropping in the new jar and  
> enjoying the improved performance.
>
> The forward compatibility of the actual index was a great boon.
>
> So, while one may not be actively developing code, dropping in a new  
> jar and getting huge performance gains is a great plus.
>

But even going from 2.2. to 2.3, you get even bigger gains by doing  
some work and actually taking the time to update your Analysis  
process, reuse tokens, etc.


-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by DM Smith <dm...@gmail.com>.

On Jan 17, 2008, at 7:57 PM, robert engels wrote:

> If they are " no longer actively developing the portion of the code  
> that's broken, aren't seeking the new feature, etc", and they stay  
> back on old versions... isn't that exactly what we want? They can  
> stay on the old version, and new application development uses the  
> newer version.
>
> It would be different if it was a core JRE interface or similar -  
> this is an optional jar.
>
> Part of what always made Windows so fragile is that as it evolved  
> they tried to maintain backward compatibility - making working with  
> the old/new code and fixing bugs almost impossible. The bloat became  
> impossible to deal with.
>
> I bet, if you did a poll of all Lucene users, you would find a  
> majority of them still only run 1.4.3, or maybe 1.9. Even with 2.0,  
> 2.3, or 3.0, that is still going to be the case.

I found that upgrading from 1.4.3 (our first version) to 1.9, to  
2.0, ... 2.2 and even 2.3 rc was painless.

The deprecations in 1.9 gave clear guidance on how to do the upgrade.  
Very easy to do. And with Lucene's robust test suite, I had great  
confidence that it would work without much testing.

Going forward was simply a matter of dropping in the new jar and  
enjoying the improved performance.

The forward compatibility of the actual index was a great boon.

So, while one may not be actively developing code, dropping in a new  
jar and getting huge performance gains is a great plus.

Many thanks to you all for such a stable product.

-- DM Smith

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by robert engels <re...@ix.netcom.com>.

If they are " no longer actively developing the portion of the code  
that's broken, aren't seeking the new feature, etc", and they stay  
back on old versions... isn't that exactly what we want? They can  
stay on the old version, and new application development uses the  
newer version.

It would be different if it was a core JRE interface or similar -  
this is an optional jar.

Part of what always made Windows so fragile is that as it evolved  
they tried to maintain backward compatibility - making working with  
the old/new code and fixing bugs almost impossible. The bloat became  
impossible to deal with.

I bet, if you did a poll of all Lucene users, you would find a  
majority of them still only run 1.4.3, or maybe 1.9. Even with 2.0,  
2.3, or 3.0, that is still going to be the case.

As always, JMO.

On Jan 17, 2008, at 3:14 PM, Doug Cutting wrote:

> Grant Ingersoll wrote:
>> 1. We add a new section to CHANGES for each release, at the top  
>> where we can declare what deprecations will be removed in the  
>> _next_ release (major or minor)  and also any interface API changes
>> 2. When deprecating, the @deprecate tag should declare what  
>> version it will be removed in and that version must be one greater  
>> than the next targeted release.  That is, if the next release is  
>> 2.4, then anything deprecated in 2.3 is game to be removed in 2.9.
>
> This would mean that one could never simply drop in the new jar and  
> expect things to still work, which is something that we currently  
> try to guarantee.  That's a significant thing to give up, in terms  
> of usability.  In my experience, folks hate incompatible changes,  
> since they're frequently no longer actively developing the portion  
> of the code that's broken, aren't seeking the new feature, etc.   
> This is why lots of folks stay back on old versions.
>
> In terms of benefits, this would permit us to evolve APIs more  
> rapidly.  So it pits external usability against API evolution  
> speed, with no clear winner.  +0
>
> Doug
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Grant Ingersoll <gs...@apache.org>.

On Jan 17, 2008, at 4:14 PM, Doug Cutting wrote:

> Grant Ingersoll wrote:
>> 1. We add a new section to CHANGES for each release, at the top  
>> where we can declare what deprecations will be removed in the  
>> _next_ release (major or minor)  and also any interface API changes
>> 2. When deprecating, the @deprecate tag should declare what version  
>> it will be removed in and that version must be one greater than the  
>> next targeted release.  That is, if the next release is 2.4, then  
>> anything deprecated in 2.3 is game to be removed in 2.9.
>
> This would mean that one could never simply drop in the new jar and  
> expect things to still work, which is something that we currently  
> try to guarantee.  That's a significant thing to give up, in terms  
> of usability.  In my experience, folks hate incompatible changes,  
> since they're frequently no longer actively developing the portion  
> of the code that's broken, aren't seeking the new feature, etc.   
> This is why lots of folks stay back on old versions.
>

Yep.  I agree.  I don't make the suggestion lightly and a perfectly  
valid answer is let's not bother.   By the same token, do people  
really just drop in a new release in these days of continuous  
integration and short release cycles?  At a minimum, it has to go  
through a fair amount of testing, right?

An alternative is to do major releases more often, but, to some  
extent, it's all just semantics.  The key is communicating what the  
exact changes are, regardless of what you call the version number.   
This does bring an extra burden in that we would need to be better  
about communicating upcoming changes.  That I am not exactly thrilled  
about, either.

> In terms of benefits, this would permit us to evolve APIs more  
> rapidly.  So it pits external usability against API evolution speed,  
> with no clear winner.  +0

Yep, I am still on the fence, but wanted to revisit the discussion in  
light of some recent bugs and comments about using Interfaces more.

Perhaps more important is how to handle fixing issues that change how  
a document is indexed.  Do we preserve the incorrectness for the sake  
of back-compatibility? Or do we tell them this way is flat out wrong  
and it won't be supported anymore?

-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Doug Cutting <cu...@apache.org>.

Grant Ingersoll wrote:
> 1. We add a new section to CHANGES for each release, at the top where we 
> can declare what deprecations will be removed in the _next_ release 
> (major or minor)  and also any interface API changes
> 2. When deprecating, the @deprecate tag should declare what version it 
> will be removed in and that version must be one greater than the next 
> targeted release.  That is, if the next release is 2.4, then anything 
> deprecated in 2.3 is game to be removed in 2.9.

This would mean that one could never simply drop in the new jar and 
expect things to still work, which is something that we currently try to 
guarantee.  That's a significant thing to give up, in terms of 
usability.  In my experience, folks hate incompatible changes, since 
they're frequently no longer actively developing the portion of the code 
that's broken, aren't seeking the new feature, etc.  This is why lots of 
folks stay back on old versions.

In terms of benefits, this would permit us to evolve APIs more rapidly. 
  So it pits external usability against API evolution speed, with no 
clear winner.  +0

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by DM Smith <dm...@gmail.com>.

Grant Ingersoll wrote:
>
> My reasoning for this solution:  Our minor release cycles are 
> currently in the 3-6 months range and our major release cycles are in 
> the 1-1.5 year range.  I think giving someone 4-8 (or whatever) months 
> is more than enough time to prepare for API changes.   I am not sure 
> how this would effect Index changes, but I do think we should KEEP our 
> current index reading policy where possible.  This may mean that some 
> deprecated items cannot be removed until a major release and I think 
> that is fine. 

Personally, I like the stability of Lucene.

I don't see any problems with deprecations being done earlier, but 
actual removal still at the major release.

Is there a roadmap of changes from 3.0 to 4.0 that would warrant such a 
procedural change? What is Lucene missing that would have such a change?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Grant Ingersoll <gs...@apache.org>.

On Jan 17, 2008, at 2:42 PM, Bill Janssen wrote:

>> Examples of the former issue include things like removing
>> deprecations sooner and the ability to add new methods to interfaces
>> (both of these are not to be done ad-hoc)
>
> What would be the difference between ad-hoc and non-ad-hoc?
>

Maybe bad choice of words, but I meant to say that no interface/ 
deprecation changes would be done without announcing it and there  
being at least one release in the meantime.  Thus, if we wanted to add  
isFancySchmancy() onto Fieldable today, it would have to be announced,  
patch provided and referenced, a release without it (i.e. 2.3) and  
then it would be available in 2.4.  By ad-hoc, I meant that we  
wouldn't just announce it and then have it show up in 2.3 and not give  
people time to digest it.

HTH,
Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Bill Janssen <ja...@parc.com>.

> Examples of the former issue include things like removing  
> deprecations sooner and the ability to add new methods to interfaces  
> (both of these are not to be done ad-hoc)

What would be the difference between ad-hoc and non-ad-hoc?

Bill

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by robert engels <re...@ix.netcom.com>.

That wasn't what I was thinking. They would use lucene23.jar if they  
wanted the 2.3 API.  Newer code uses the lucene30.jar for the 3.0 API.

The others could continue to back-port 3.0 features to 2.3.X if they  
wished (and could do so without changing the API - private changes  
only).

I think you can look at the Oracle JDBC drivers as an example. They  
warn that a API is going away in the next release, then it is gone.  
Yet the new drivers may perform much better (not to mention fix a lot  
of bugs). If you weren't using the old features you can easily move  
to the new jar, if not, you need to change your code. Granted, it is  
much better now, since they no longer have many needed proprietary  
features, and rely mostly on the JDBC specification. They often  
release newer versions of earlier releases when critical bugs have  
been fixed.

JDBC is another good example. JDBC 3.0 requires Java 5. You cannot  
use JDBC 3.0 features without it. Because of this, many of the db  
vendors latest drivers are Java 5 only, and you need to use a  
previous release if running under 1.4.

On Jan 18, 2008, at 1:04 AM, Karl Wettin wrote:

>
> 18 jan 2008 kl. 07.41 skrev robert engels:
>
>> Look at similar problems and how they handled in the JDK. The Date  
>> class has been notorious since its inception. The Calendar class  
>> is almost no better, now they are developing JSR-310 to replace both.
>>
>> Existing code can still use the Date or Calendar classes. Both  
>> they don't get any "newer" features. This would be similar to use  
>> the old lucene jar.
>
> Sort of keeping all version in the trunk at once? IndexWriter2 is  
> IndexWriter with some some features replaced with something better?  
> And then IndexWriter3..? That's a bit messy if you ask me. But it  
> would work. But terrible messy.
>
> -- 
> karl
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Grant Ingersoll <gs...@apache.org>.

Me too...

On Jan 18, 2008, at 4:33 AM, Uwe Schindler wrote:

>> Sort of keeping all version in the trunk at once? IndexWriter2 is
>> IndexWriter with some some features replaced with something better?
>> And then IndexWriter3..? That's a bit messy if you ask me. But it
>> would work. But terrible messy.
>
> Brrr, I hate this. Microsoft does this always when they update their  
> COM
> interfaces...
>
> Uwe
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

RE: Back Compatibility

Posted by Uwe Schindler <uw...@thetaphi.de>.

> Sort of keeping all version in the trunk at once? IndexWriter2 is
> IndexWriter with some some features replaced with something better?
> And then IndexWriter3..? That's a bit messy if you ask me. But it
> would work. But terrible messy.

Brrr, I hate this. Microsoft does this always when they update their COM
interfaces...

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Karl Wettin <ka...@gmail.com>.

18 jan 2008 kl. 07.41 skrev robert engels:

> Look at similar problems and how they handled in the JDK. The Date  
> class has been notorious since its inception. The Calendar class is  
> almost no better, now they are developing JSR-310 to replace both.
>
> Existing code can still use the Date or Calendar classes. Both they  
> don't get any "newer" features. This would be similar to use the old  
> lucene jar.

Sort of keeping all version in the trunk at once? IndexWriter2 is  
IndexWriter with some some features replaced with something better?  
And then IndexWriter3..? That's a bit messy if you ask me. But it  
would work. But terrible messy.

-- 
karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by robert engels <re...@ix.netcom.com>.

That brings us back to an earlier discussion: "if majority want to  
break compatibility, then we should do so, and the minority can back- 
port the changes to a previous release if they feel it is warranted."

I don't understand why that isn't a viable approach.

I agree that maintaining interface compatibility through versions is  
a great ideal, but when the API becomes so bloated (deprecated  
methods, and even usage patterns), it is much harder to learn, and  
use properly.

Look at similar problems and how they handled in the JDK. The Date  
class has been notorious since its inception. The Calendar class is  
almost no better, now they are developing JSR-310 to replace both.

Existing code can still use the Date or Calendar classes. Both they  
don't get any "newer" features. This would be similar to use the old  
lucene jar.

On Jan 18, 2008, at 12:31 AM, Karl Wettin wrote:

>
> 18 jan 2008 kl. 03.39 skrev Grant Ingersoll:
>
>> Does anyone have experience w/ how other open source projects deal  
>> with this?
>
> Would be a pain to implement, but it could be done as libcompat.
>
> lucene-2.4-compat-core-3.0.jar
>
>
> -- 
> karl
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Karl Wettin <ka...@gmail.com>.

18 jan 2008 kl. 03.39 skrev Grant Ingersoll:

> Does anyone have experience w/ how other open source projects deal  
> with this?

Would be a pain to implement, but it could be done as libcompat.

lucene-2.4-compat-core-3.0.jar


-- 
karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Doug Cutting <cu...@apache.org>.

Grant Ingersoll wrote:
> Does anyone have experience w/ how other open source projects deal with 
> this?

Use abstract base classes instead of interfaces: they're much easier to 
evolve back-compatibly.  In Hadoop, for example, we really wish that 
Mapper and Reducer were not interfaces and are very happy that 
FileSystem is an abstract class.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by robert engels <re...@ix.netcom.com>.

I don't think I can say that this needs to happen now either. :)

An interesting question to answer would be:

If Lucene did not exist, and given all of the knowledge we have, we  
decided to create a Java based search engine, would the API look like  
it does today?

The answer may be yes. I doubt it would be in many areas though.

The major releases are where you get to rethink the API and the  
approach.

If you don't do this, Lucene will slowly die (as you stated).  What  
happens is that the developers get tired of the harness and start a  
new project.  If the API were able to be changed easier this would  
not happen.


On Jan 23, 2008, at 4:27 PM, Michael McCandless wrote:

>
> robert engels wrote:
>
>> I think you are incorrect.
>>
>> I would guess the number of people/organizations using Lucene vs.  
>> contributing to Lucene is much greater.
>>
>> The contributers work in head (should IMO). The users can select a  
>> particular version of Lucene and code their apps accordingly. They  
>> can also back-port features from a later to an earlier release. If  
>> they have limited development resources, they are probably not  
>> working on Lucene (they are working on their apps), but they can  
>> update their own code to work with later versions - which they  
>> would probably rather do than learning the internals and  
>> contributing to Lucene.
>>
>> If the users are "just dropping in a new version" they are not  
>> contributing to the community... I think just the opposite, they  
>> are parasites.  I think a way to gauge this would be the number of  
>> questions/people on the user list versus the development list.
>
> I don't think they are parasites at all.  They are users that place  
> alot of trust in us and will come to the users list with  
> interesting issues.  Many of the improvements to Lucene are sourced  
> from the users list.  Even if that user doesn't do the actual work  
> to fix the issue, their innocent question and prodding can inspire  
> someone else to take the idea forward, make a patch, etc.  This is  
> the normal and healthy way that open source works....
>
>> Lucene is a library, and I believe what I stated is earlier is  
>> true - in order to continue to advance it the API needs to be  
>> permitted to change to allow for better functionality and  
>> performance. If Lucene is hand-tied by earlier APIs then this work  
>> is either not going to happen, or be very messy (inefficient).
>
> The thing is, we have been able to advance lately, sizably, without  
> breaking APIs, thanks to the "future backwards compatibility  
> proofing" that Lucene does.
>
> I do agree that if it got to the point where we were forced to make  
> a hard choice of stunt Lucene's growth so as to keep backwards  
> compatibility vs let Lucene grow and make a new major release, we  
> should definitely make a new major release.  Search is still young  
> and if we stunt Lucene now it will slowly die.
>
> It's just that I haven't seen any recent change, except for  
> allowing JVM 1.5 source, that actually requires a major release, I  
> think.
>
> Mike
>
>> On Jan 23, 2008, at 3:40 PM, Chris Hostetter wrote:
>>
>>>
>>> : I guess I don't see the back-porting as an issue. Only those  
>>> that want to need
>>> : to do the back-porting. Head moves on...
>>>
>>> I view it as a potential risk to the overal productivity of the  
>>> community.
>>>
>>> If upgrading from A to B is easy people (in general) won't spend  
>>> a lot of
>>> time/effort backporting feature from B to A -- this time/effort  
>>> savings
>>> benefits the community because (depending on the person):
>>>  1) that time/effort saved can be spend contributing even more  
>>> features
>>>     to Lucene
>>>  2) that time/effort saved improves the impressions people have  
>>> of Lucene.
>>>
>>> If on the other hand upgrading from X to Y is "hard" that  
>>> encouragees
>>> people to backport features ... in some cases this backporting  
>>> may be done
>>> "in the open" with people contributing these backports as  
>>> patches, which
>>> can then be commited/releaseed by developers ... but there is  
>>> still a
>>> time/effort cost there ... a bigger time/effort cost is the  
>>> cummulative
>>> time/effort cost of all the people that backport some set of  
>>> features just
>>> enough to get things working for themselves on their local copy,  
>>> and don't
>>> contribute thouse changes back ... that cost gets paid by the  
>>> commuity s a
>>> whole over and over again.
>>>
>>> I certianly don't want to discourage anyone who *wants* to backport
>>> features, and I would never suggest that Lucene should make it a  
>>> policy to
>>> not accept patches to previous releases that backport  
>>> functionality -- i
>>> just think we should do our best to minimize the need/motivation  
>>> to spend
>>> time/effort on backporting.
>>>
>>>
>>>
>>> -Hoss
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

robert engels wrote:

> I think you are incorrect.
>
> I would guess the number of people/organizations using Lucene vs.  
> contributing to Lucene is much greater.
>
> The contributers work in head (should IMO). The users can select a  
> particular version of Lucene and code their apps accordingly. They  
> can also back-port features from a later to an earlier release. If  
> they have limited development resources, they are probably not  
> working on Lucene (they are working on their apps), but they can  
> update their own code to work with later versions - which they  
> would probably rather do than learning the internals and  
> contributing to Lucene.
>
> If the users are "just dropping in a new version" they are not  
> contributing to the community... I think just the opposite, they  
> are parasites.  I think a way to gauge this would be the number of  
> questions/people on the user list versus the development list.

I don't think they are parasites at all.  They are users that place  
alot of trust in us and will come to the users list with interesting  
issues.  Many of the improvements to Lucene are sourced from the  
users list.  Even if that user doesn't do the actual work to fix the  
issue, their innocent question and prodding can inspire someone else  
to take the idea forward, make a patch, etc.  This is the normal and  
healthy way that open source works....

> Lucene is a library, and I believe what I stated is earlier is true  
> - in order to continue to advance it the API needs to be permitted  
> to change to allow for better functionality and performance. If  
> Lucene is hand-tied by earlier APIs then this work is either not  
> going to happen, or be very messy (inefficient).

The thing is, we have been able to advance lately, sizably, without  
breaking APIs, thanks to the "future backwards compatibility  
proofing" that Lucene does.

I do agree that if it got to the point where we were forced to make a  
hard choice of stunt Lucene's growth so as to keep backwards  
compatibility vs let Lucene grow and make a new major release, we  
should definitely make a new major release.  Search is still young  
and if we stunt Lucene now it will slowly die.

It's just that I haven't seen any recent change, except for allowing  
JVM 1.5 source, that actually requires a major release, I think.

Mike

> On Jan 23, 2008, at 3:40 PM, Chris Hostetter wrote:
>
>>
>> : I guess I don't see the back-porting as an issue. Only those  
>> that want to need
>> : to do the back-porting. Head moves on...
>>
>> I view it as a potential risk to the overal productivity of the  
>> community.
>>
>> If upgrading from A to B is easy people (in general) won't spend a  
>> lot of
>> time/effort backporting feature from B to A -- this time/effort  
>> savings
>> benefits the community because (depending on the person):
>>  1) that time/effort saved can be spend contributing even more  
>> features
>>     to Lucene
>>  2) that time/effort saved improves the impressions people have of  
>> Lucene.
>>
>> If on the other hand upgrading from X to Y is "hard" that encouragees
>> people to backport features ... in some cases this backporting may  
>> be done
>> "in the open" with people contributing these backports as patches,  
>> which
>> can then be commited/releaseed by developers ... but there is still a
>> time/effort cost there ... a bigger time/effort cost is the  
>> cummulative
>> time/effort cost of all the people that backport some set of  
>> features just
>> enough to get things working for themselves on their local copy,  
>> and don't
>> contribute thouse changes back ... that cost gets paid by the  
>> commuity s a
>> whole over and over again.
>>
>> I certianly don't want to discourage anyone who *wants* to backport
>> features, and I would never suggest that Lucene should make it a  
>> policy to
>> not accept patches to previous releases that backport  
>> functionality -- i
>> just think we should do our best to minimize the need/motivation  
>> to spend
>> time/effort on backporting.
>>
>>
>>
>> -Hoss
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Grant Ingersoll <gs...@apache.org>.

+1.  And, we always have the major version release at our disposal if  
need be.

At any rate, I think we have beaten this one to death.  I think it is  
a useful to look back every now and then on the major things that  
guide us and make sure we all still agree, at least for the most  
part.   For now, I think our plan is pretty straightforward.  2.4  
pretty quickly (3 months?) and then 2.9, all of which will be back- 
compat.  Then onto 3.0 which will be a full upgrade to 1.5, thus  
dropping support for 1.4.

-Grant


On Jan 27, 2008, at 8:05 PM, Chris Hostetter wrote:

> : I would guess the number of people/organizations using Lucene vs.  
> contributing
> : to Lucene is much greater.
> :
> : The contributers work in head (should IMO). The users can select a  
> particular
> : version of Lucene and code their apps accordingly. They can also  
> back-port
> : features from a later to an earlier release. If they have limited  
> development
> : resources, they are probably not working on Lucene (they are  
> working on their
> : apps), but they can update their own code to work with later  
> versions - which
> : they would probably rather do than learning the internals and  
> contributing to
> : Lucene.
>
> i think we have a semantic disconnect on the definition of "community"
>
> I am including any and all people/projects that use Lucene in anyway  
> --
> wether or not they contribute back or not.  If there are 1000 projects
> using lucene as a library, and each project requires 5 man hours of  
> work
> to upgrade from version X to version Y becuse of a non-backwards
> compatible change, but it would only take 2 man hours of work for  
> those
> projects to backport / rip out the one or two features of version Y  
> they
> really want to cram them into their code base then the community as a
> whole is paying a really heavy cost for version Y ... regardless of  
> wether
> each of those 1000 projects invest the 5 hours or the 2 hours ... in  
> the
> first extreme we're all spending a cumulative total of 5000 man  
> hours.  in
> the second case we're spending 2000 man hours, and now we've got  
> 1000 apps
> that are runing hacked up unofficial offshoots of version X that will
> never be able to upgrade to version Z when it comes out -- the  
> community
> not only becomes very fractured but lucene as a whole gets a bad wrap,
> because everybody talks about how they still run version X with local
> patches instead of using version Y -- it makes new users wonder  
> "what's
> wrong with version Y?" ... "if upgrading is so hard that no one does  
> it do
> i really wnat to use this library?"
>
> It may seem like a socialist or a communist or a free love hippy  
> attitude,
> but if contributors and committers take extra time to develop more
> incrimental releases and backwards compatible API transitions it may  
> cost
> them more time upfront, but it saves the community as a whole a  
> *lot* of
> time in the long run.
>
> By all means: we should move forward anytime really great  
> improvements can
> be made through new APIs and new features -- but we need to keep in  
> mind
> that if those new APIs and features are hard for our current user  
> base to
> adapt to, then we aren't doing the community as a whole any favors by
> throwing the baby out with the bath water and prematurely throwing  
> away
> an old API in order to support the new one.
>
> Trade offs must be made.  Sometimes that may mean sacrificing  
> committer
> man hours; or performance; or API cleanliness; in order to reap the
> benefit of a strong, happy, healthy, community.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by robert engels <re...@ix.netcom.com>.

And then you can end up like the Soviet Union...

The basic problems of communism - those that don't contribute their  
fair share, but suck out the minimum resources (but maximum in  
totality), and those that want to lead (their contribution) and suck  
the minimum, and then those that contribute the most to make up for  
everyone else, and quickly say this SUCKS....


On Jan 27, 2008, at 7:05 PM, Chris Hostetter wrote:

> : I would guess the number of people/organizations using Lucene vs.  
> contributing
> : to Lucene is much greater.
> :
> : The contributers work in head (should IMO). The users can select  
> a particular
> : version of Lucene and code their apps accordingly. They can also  
> back-port
> : features from a later to an earlier release. If they have limited  
> development
> : resources, they are probably not working on Lucene (they are  
> working on their
> : apps), but they can update their own code to work with later  
> versions - which
> : they would probably rather do than learning the internals and  
> contributing to
> : Lucene.
>
> i think we have a semantic disconnect on the definition of "community"
>
> I am including any and all people/projects that use Lucene in  
> anyway --
> wether or not they contribute back or not.  If there are 1000 projects
> using lucene as a library, and each project requires 5 man hours of  
> work
> to upgrade from version X to version Y becuse of a non-backwards
> compatible change, but it would only take 2 man hours of work for  
> those
> projects to backport / rip out the one or two features of version Y  
> they
> really want to cram them into their code base then the community as a
> whole is paying a really heavy cost for version Y ... regardless of  
> wether
> each of those 1000 projects invest the 5 hours or the 2 hours ...  
> in the
> first extreme we're all spending a cumulative total of 5000 man  
> hours.  in
> the second case we're spending 2000 man hours, and now we've got  
> 1000 apps
> that are runing hacked up unofficial offshoots of version X that will
> never be able to upgrade to version Z when it comes out -- the  
> community
> not only becomes very fractured but lucene as a whole gets a bad wrap,
> because everybody talks about how they still run version X with local
> patches instead of using version Y -- it makes new users wonder  
> "what's
> wrong with version Y?" ... "if upgrading is so hard that no one  
> does it do
> i really wnat to use this library?"
>
> It may seem like a socialist or a communist or a free love hippy  
> attitude,
> but if contributors and committers take extra time to develop more
> incrimental releases and backwards compatible API transitions it  
> may cost
> them more time upfront, but it saves the community as a whole a  
> *lot* of
> time in the long run.
>
> By all means: we should move forward anytime really great  
> improvements can
> be made through new APIs and new features -- but we need to keep in  
> mind
> that if those new APIs and features are hard for our current user  
> base to
> adapt to, then we aren't doing the community as a whole any favors by
> throwing the baby out with the bath water and prematurely throwing  
> away
> an old API in order to support the new one.
>
> Trade offs must be made.  Sometimes that may mean sacrificing  
> committer
> man hours; or performance; or API cleanliness; in order to reap the
> benefit of a strong, happy, healthy, community.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Endre Stølsvik <En...@stolsvik.com>.

> It may seem like a socialist or a communist or a free love hippy attitude,

It sounds like a perfect attitude.

(In particular the "free love hippie" part - does it come with LSD and 
tie-dyed/batik clothes too?)

Kind regards,
Endre.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Chris Hostetter <ho...@fucit.org>.

: I would guess the number of people/organizations using Lucene vs. contributing
: to Lucene is much greater.
: 
: The contributers work in head (should IMO). The users can select a particular
: version of Lucene and code their apps accordingly. They can also back-port
: features from a later to an earlier release. If they have limited development
: resources, they are probably not working on Lucene (they are working on their
: apps), but they can update their own code to work with later versions - which
: they would probably rather do than learning the internals and contributing to
: Lucene.

i think we have a semantic disconnect on the definition of "community"

I am including any and all people/projects that use Lucene in anyway -- 
wether or not they contribute back or not.  If there are 1000 projects 
using lucene as a library, and each project requires 5 man hours of work 
to upgrade from version X to version Y becuse of a non-backwards 
compatible change, but it would only take 2 man hours of work for those 
projects to backport / rip out the one or two features of version Y they 
really want to cram them into their code base then the community as a 
whole is paying a really heavy cost for version Y ... regardless of wether 
each of those 1000 projects invest the 5 hours or the 2 hours ... in the 
first extreme we're all spending a cumulative total of 5000 man hours.  in 
the second case we're spending 2000 man hours, and now we've got 1000 apps 
that are runing hacked up unofficial offshoots of version X that will 
never be able to upgrade to version Z when it comes out -- the community 
not only becomes very fractured but lucene as a whole gets a bad wrap, 
because everybody talks about how they still run version X with local 
patches instead of using version Y -- it makes new users wonder "what's 
wrong with version Y?" ... "if upgrading is so hard that no one does it do 
i really wnat to use this library?"

It may seem like a socialist or a communist or a free love hippy attitude, 
but if contributors and committers take extra time to develop more 
incrimental releases and backwards compatible API transitions it may cost 
them more time upfront, but it saves the community as a whole a *lot* of 
time in the long run.

By all means: we should move forward anytime really great improvements can 
be made through new APIs and new features -- but we need to keep in mind 
that if those new APIs and features are hard for our current user base to 
adapt to, then we aren't doing the community as a whole any favors by 
throwing the baby out with the bath water and prematurely throwing away 
an old API in order to support the new one.  

Trade offs must be made.  Sometimes that may mean sacrificing committer 
man hours; or performance; or API cleanliness; in order to reap the 
benefit of a strong, happy, healthy, community.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by robert engels <re...@ix.netcom.com>.

The statement upon rereading seems much stronger than intended. You  
are correct, but I think the number of users that become contributers  
is still far less than the number of users.

The only abandonment of the users was from the standpoint of  
maintaining a legacy API. The users are free to update their code to  
move with Lucene. They are the ones choosing to stay behind.

Even though I have contributed very little to Lucene, I still fight  
for the developers ability to move it forward - since I do contribute  
so little !!!!. It is up to me to update my code, or stay where I am  
at. Now, if Lucene created a release every week that completely  
changed the API and broke everything I wrote, while the old release  
still had numerous serious bugs, I would quickly grow frustrated and  
find a new library. That is not the case, and I don't think anyone  
(especially me) is arguing for that.

On Jan 23, 2008, at 4:29 PM, Steven A Rowe wrote:

> Hi robert,
>
> On 01/23/2008 at 4:55 PM, robert engels wrote:
>> If the users are "just dropping in a new version" they are not
>> contributing to the community... I think just the opposite, they are
>> parasites.
>
> I reject your characterization of passive users as "parasites"; I  
> suspect that you intend your casual use of this highly prejudicial  
> term to license wholesale abandonment of them as a valid constituency.
>
> In my estimation, nearly every active contributor to open source  
> projects, including Lucene, was once a passive user.  If you  
> discourage that pipeline, you cut off the supply of fresh  
> perspectives and future contributions.  Please, let's not do that.
>
> Steve
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

RE: Back Compatibility

Posted by Steven A Rowe <sa...@syr.edu>.

Hi robert,

On 01/23/2008 at 4:55 PM, robert engels wrote:
> If the users are "just dropping in a new version" they are not
> contributing to the community... I think just the opposite, they are
> parasites.

I reject your characterization of passive users as "parasites"; I suspect that you intend your casual use of this highly prejudicial term to license wholesale abandonment of them as a valid constituency.

In my estimation, nearly every active contributor to open source projects, including Lucene, was once a passive user.  If you discourage that pipeline, you cut off the supply of fresh perspectives and future contributions.  Please, let's not do that.

Steve

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by robert engels <re...@ix.netcom.com>.

I think you are incorrect.

I would guess the number of people/organizations using Lucene vs.  
contributing to Lucene is much greater.

The contributers work in head (should IMO). The users can select a  
particular version of Lucene and code their apps accordingly. They  
can also back-port features from a later to an earlier release. If  
they have limited development resources, they are probably not  
working on Lucene (they are working on their apps), but they can  
update their own code to work with later versions - which they would  
probably rather do than learning the internals and contributing to  
Lucene.

If the users are "just dropping in a new version" they are not  
contributing to the community... I think just the opposite, they are  
parasites.  I think a way to gauge this would be the number of  
questions/people on the user list versus the development list.

Lucene is a library, and I believe what I stated is earlier is true -  
in order to continue to advance it the API needs to be permitted to  
change to allow for better functionality and performance. If Lucene  
is hand-tied by earlier APIs then this work is either not going to  
happen, or be very messy (inefficient).

On Jan 23, 2008, at 3:40 PM, Chris Hostetter wrote:

>
> : I guess I don't see the back-porting as an issue. Only those that  
> want to need
> : to do the back-porting. Head moves on...
>
> I view it as a potential risk to the overal productivity of the  
> community.
>
> If upgrading from A to B is easy people (in general) won't spend a  
> lot of
> time/effort backporting feature from B to A -- this time/effort  
> savings
> benefits the community because (depending on the person):
>  1) that time/effort saved can be spend contributing even more  
> features
>     to Lucene
>  2) that time/effort saved improves the impressions people have of  
> Lucene.
>
> If on the other hand upgrading from X to Y is "hard" that encouragees
> people to backport features ... in some cases this backporting may  
> be done
> "in the open" with people contributing these backports as patches,  
> which
> can then be commited/releaseed by developers ... but there is still a
> time/effort cost there ... a bigger time/effort cost is the  
> cummulative
> time/effort cost of all the people that backport some set of  
> features just
> enough to get things working for themselves on their local copy,  
> and don't
> contribute thouse changes back ... that cost gets paid by the  
> commuity s a
> whole over and over again.
>
> I certianly don't want to discourage anyone who *wants* to backport
> features, and I would never suggest that Lucene should make it a  
> policy to
> not accept patches to previous releases that backport functionality  
> -- i
> just think we should do our best to minimize the need/motivation to  
> spend
> time/effort on backporting.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Chris Hostetter <ho...@fucit.org>.

: I guess I don't see the back-porting as an issue. Only those that want to need
: to do the back-porting. Head moves on...

I view it as a potential risk to the overal productivity of the community.  

If upgrading from A to B is easy people (in general) won't spend a lot of 
time/effort backporting feature from B to A -- this time/effort savings 
benefits the community because (depending on the person):
 1) that time/effort saved can be spend contributing even more features 
    to Lucene
 2) that time/effort saved improves the impressions people have of Lucene.

If on the other hand upgrading from X to Y is "hard" that encouragees 
people to backport features ... in some cases this backporting may be done 
"in the open" with people contributing these backports as patches, which 
can then be commited/releaseed by developers ... but there is still a 
time/effort cost there ... a bigger time/effort cost is the cummulative 
time/effort cost of all the people that backport some set of features just 
enough to get things working for themselves on their local copy, and don't 
contribute thouse changes back ... that cost gets paid by the commuity s a 
whole over and over again.

I certianly don't want to discourage anyone who *wants* to backport 
features, and I would never suggest that Lucene should make it a policy to 
not accept patches to previous releases that backport functionality -- i 
just think we should do our best to minimize the need/motivation to spend 
time/effort on backporting.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by robert engels <re...@ix.netcom.com>.

I guess I don't see the back-porting as an issue. Only those that  
want to need to do the back-porting. Head moves on...


On Jan 23, 2008, at 2:00 PM, Chris Hostetter wrote:

>
> : I do like the idea of a static/system property to match legacy
> : behavior.  For example, the bugs around how StandardTokenizer
> : mislabels tokens (eg LUCENE-1100), this would be the perfect  
> solution.
> : Clearly those are silly bugs that should be fixed, quickly, with  
> this
> : back-compatible mode to keep the bug in place.
> :
> : We might want to, instead, have ctors for many classes take a  
> required
> : arg which states the version of Lucene you are using?  So if you are
> : writing a new app you would pass in the current version.  Then, on
> : dropping in a future Lucene JAR, we could use that arg to enforce  
> the
> : right backwards compatibility.  This would save users from having to
> : realize they are hitting one of these situations and then know to go
> : set the right static/property to retain the buggy behavior.
>
> I'm not sure that this would be better though ... when i write my  
> code, i
> pass "2.3" to all these constructors (or factory methods) and then  
> later i
> want to upgrade to 2.3 to get all the new performance goodness ... i
> shouldn't have to change all those constructor calls to get all the  
> 2.4
> goodness, i should be able to leave my code as is -- but if i do that,
> then i might not get all the 2.4 goodness, (like improved
> tokenization, or more precise segment merging) because some of that
> goodness violates previous assumptions that some code might have  
> had ...
> my code doesn't have those assumptions, i know nothing about them,  
> i'll
> take whatever behavior the Lucene Developers recommend (unless i see
> evidence that it breaks something, in which case i'll happily set a
> system property or something that the release notes say will force the
> old behavior.
>
> The basic principle being: by default, give users the behavior that is
> generally viewed as "correct" -- but give them the option to force
> "uncorrect" legacy behavior.
>
> : Also, backporting is extremely costly over time.  I'd much rather  
> keep
> : compatibility for longer on our forward releases, than spend our
> : scarce resources moving changes back.
>
> +1
>
> : So to summarize ... I think we should have (keep) a high  
> tolerance for
> : cruft to maintain API compatibility.  I think our current approach
> : (try hard to keep compatibility during "minor" releases, then
> : deprecate, then remove APIs on a major release; do major releases  
> only
> : when truly required) is a good one.
>
> i'm with you for the most part, it's just the defintion of "when truly
> required" that tends to hang people up ... there's a chicken vs egg
> problem of deciding wether the code should drive what the next release
> number is: "i've added a bitch'n feature but it requires adding a  
> method
> to an interface, therefor the next release must be called 4.0" ...  
> vs the
> mindset that "we just had a 3.0 release, it's too soon for another  
> major
> release, the next release should be called 3.1, so we need to hold  
> off on
> commiting non backwards compatible changes for a while."
>
> I'm in the first camp: version numbers should be descriptive,  
> information
> carrying, labels for releases -- but the version number of a release
> should be dicated by the code contained in that release.  (if that  
> means
> the next version after 3.0.0 is 4.0.0, then so be it.)
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Grant Ingersoll <gs...@apache.org>.

+1
On Jan 27, 2008, at 8:34 PM, Chris Hostetter wrote:

>
> : But I do agree, benchmark doesn't have the same litmus test.
>
> the generalization of that statement probably being "all contribs  
> are not
> created equal."
>
> I propose making some comments in the BackwardsCompatibility wiki page
> about the compatibility commitments of contribs depends largely on  
> their
> maturity and intended usage and that the README.txt file for each  
> contrib
> will identify it's approach to compatibility.
>
> we can put some boler plate in the README for most of the contribs,  
> and
> special verbage in the README for the special contribs.
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Chris Hostetter <ho...@fucit.org>.

: But I do agree, benchmark doesn't have the same litmus test.

the generalization of that statement probably being "all contribs are not 
created equal."

I propose making some comments in the BackwardsCompatibility wiki page 
about the compatibility commitments of contribs depends largely on their 
maturity and intended usage and that the README.txt file for each contrib 
will identify it's approach to compatibility.

we can put some boler plate in the README for most of the contribs, and 
special verbage in the README for the special contribs.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Grant Ingersoll <gs...@apache.org>.

Well, contrib/Wikipedia has a dependency on it, but at least it is  
self contained.  I would love to see the Wikipedia stuff extracted out  
of benchmark and be in contrib/wikipedia (thus flipping the  
dependency), but the effort isn't particularly high on my list.

But I do agree, benchmark doesn't have the same litmus test.

-Grant

On Jan 25, 2008, at 4:01 PM, Doron Cohen wrote:

> On Jan 25, 2008 8:04 PM, Grant Ingersoll <gs...@apache.org> wrote:
>
>> One more thought on back compatibility:
>>
>> Do we have the same requirements for any and all contrib modules?  I
>> am especially thinking about the benchmark contrib, but it probably
>> applies to others as well.
>>
>> -Grant
>>
>
> In general I think that contrib should have same requirements, because
> there may be applications out there depending on it - e.g.  
> highlighting,
> spell-correction - and here too, unstable packages can be marked with
> the temporary warning such those we currently have for  
> search.function.
>
> benchmark is different in that - I think - there are no applications  
> that
> depend on it, so perhaps we can have more flexibility in it?
>
> Doron



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Doron Cohen <cd...@gmail.com>.

On Jan 25, 2008 8:04 PM, Grant Ingersoll <gs...@apache.org> wrote:

> One more thought on back compatibility:
>
> Do we have the same requirements for any and all contrib modules?  I
> am especially thinking about the benchmark contrib, but it probably
> applies to others as well.
>
> -Grant
>

In general I think that contrib should have same requirements, because
there may be applications out there depending on it - e.g. highlighting,
spell-correction - and here too, unstable packages can be marked with
the temporary warning such those we currently have for search.function.

benchmark is different in that - I think - there are no applications that
depend on it, so perhaps we can have more flexibility in it?

Doron

Re: Back Compatibility

Posted by Grant Ingersoll <gs...@apache.org>.

One more thought on back compatibility:

Do we have the same requirements for any and all contrib modules?  I  
am especially thinking about the benchmark contrib, but it probably  
applies to others as well.

-Grant


On Jan 24, 2008, at 8:42 AM, Grant Ingersoll wrote:

>
> On Jan 24, 2008, at 4:27 AM, Michael McCandless wrote:
>
>>
>> Grant Ingersoll wrote:
>>
>>> Yes, I agree these are what is about (despite the divergence into  
>>> locking).
>>>
>>> As I see, it the question is about whether we should try to do  
>>> major releases on the order of a year, rather than the current 2+  
>>> year schedule and also how to best handle bad behavior when  
>>> producing tokens that previous applications rely on.
>>>
>>> On the first case, we said we would try to do minor releases more  
>>> frequently (on the order of once a quarter) in the past, but this,  
>>> so far hasn't happened.   However, it has only been one release,  
>>> and it did have a lot of big changes that warranted longer  
>>> testing.  I do agree with Michael M. that we have done a good job  
>>> of keeping back compatibility.  I still don't know if trying to  
>>> clean out deprecations once a year puts some onerous task on  
>>> people when it comes to upgrading as opposed to doing every two  
>>> years.  Do people really have code that they never compile or work  
>>> on in over a year?  If they do, do they care about upgrading?  It  
>>> clearly means they are happy w/ Lucene and don't need any bug  
>>> fixes.  I can understand this being a bigger issue if it were on  
>>> the order of every 6 months or less, but that isn't what I am  
>>> proposing.  I guess my suggestion would be that we try to get back  
>>> onto the once a quarter release goal, which will more than likely  
>>> lead to a major release in the 1-1.5 year time frame.  That being  
>>> said, I am fine with maintaining the status quo concerning back.  
>>> compatibility as I think those arguments are compelling.  On the  
>>> interface thing, I wish there was a @introducing annotation that  
>>> could announce the presence of a new method and would give a  
>>> warning up until the version specified is met, at which point it  
>>> would break the compile, but I realize the semantics of that are  
>>> pretty weird, so...
>>
>> I do think we should try for minor releases more frequently,  
>> independent of the backwards compatibility question (how often to  
>> do major releases) :)
>>
>
> +1
>
> The question then becomes what can we do to improve our development  
> process?
>
>> I think major releases should be done only when a major feature  
>> truly "forces" us to (which Java 1.5 has) and not because we want  
>> to clean out the accumulated cruft we are carrying forward to  
>> preserve backwards compatibility.
>>
>>> As for the other issue concerning things like token issues, I  
>>> think it is reasonable to fix the bug and just let people know it  
>>> will change indexing, but try to allow for the old way if it is  
>>> not to onerous.  Chances are most people aren't even aware of it,  
>>> and thus telling them about may actually cause them to consider  
>>> it.  For things like maxFieldLength, etc. then back compat. is a  
>>> reasonable thing to preserve.
>>
>> So, in hindsight, the acronym/host setting for StandardAnalyzer  
>> really should have defaulted to "true", meaning the bug is fixed,  
>> but users who somehow depend on the bug (which should be a tiny  
>> minority) have an avenue (setReplaceInvalidAcronym) to keep back  
>> compatibility if needed even on a minor release, right?  I agree.   
>> (And so in 2.4 we should fix the default to true?).
>
>
>>
>>
>> I think for such issues where it's a very minor break in backwards  
>> compatibility, we should make the break, and very carefully  
>> document this in the "Changes in runtime behavior" section, even  
>> within a minor release.  I don't think such changes should drive us  
>> to a major release.
>
>
> +1
>
> -Grant
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Chris Hostetter <ho...@fucit.org>.

: > So, in hindsight, the acronym/host setting for StandardAnalyzer really
: > should have defaulted to "true", meaning the bug is fixed, but users who
: > somehow depend on the bug (which should be a tiny minority) have an avenue
: > (setReplaceInvalidAcronym) to keep back compatibility if needed even on a
: > minor release, right?  I agree.  (And so in 2.4 we should fix the default to
: > true?).

: > I think for such issues where it's a very minor break in backwards
: > compatibility, we should make the break, and very carefully document this in
: > the "Changes in runtime behavior" section, even within a minor release.  I
: > don't think such changes should drive us to a major release.

: +1

I've made some verbage changes to BackwardsCompatibility to document that 
we may in fact make runtime behavior hcanges which are not strictly 
"backwards compatible" and what commitments we have to lettings users 
force the old behavior if we make a change like this in a minor release.

most of this verbage is just me making stuff up based on this thread ... 
it is absolutely open for discussion (and editing by people with more 
grammer sense then me)...

http://wiki.apache.org/lucene-java/BackwardsCompatibility



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Grant Ingersoll <gs...@apache.org>.

On Jan 24, 2008, at 4:27 AM, Michael McCandless wrote:

>
> Grant Ingersoll wrote:
>
>> Yes, I agree these are what is about (despite the divergence into  
>> locking).
>>
>> As I see, it the question is about whether we should try to do  
>> major releases on the order of a year, rather than the current 2+  
>> year schedule and also how to best handle bad behavior when  
>> producing tokens that previous applications rely on.
>>
>> On the first case, we said we would try to do minor releases more  
>> frequently (on the order of once a quarter) in the past, but this,  
>> so far hasn't happened.   However, it has only been one release,  
>> and it did have a lot of big changes that warranted longer  
>> testing.  I do agree with Michael M. that we have done a good job  
>> of keeping back compatibility.  I still don't know if trying to  
>> clean out deprecations once a year puts some onerous task on people  
>> when it comes to upgrading as opposed to doing every two years.  Do  
>> people really have code that they never compile or work on in over  
>> a year?  If they do, do they care about upgrading?  It clearly  
>> means they are happy w/ Lucene and don't need any bug fixes.  I can  
>> understand this being a bigger issue if it were on the order of  
>> every 6 months or less, but that isn't what I am proposing.  I  
>> guess my suggestion would be that we try to get back onto the once  
>> a quarter release goal, which will more than likely lead to a major  
>> release in the 1-1.5 year time frame.  That being said, I am fine  
>> with maintaining the status quo concerning back. compatibility as I  
>> think those arguments are compelling.  On the interface thing, I  
>> wish there was a @introducing annotation that could announce the  
>> presence of a new method and would give a warning up until the  
>> version specified is met, at which point it would break the  
>> compile, but I realize the semantics of that are pretty weird, so...
>
> I do think we should try for minor releases more frequently,  
> independent of the backwards compatibility question (how often to do  
> major releases) :)
>

+1

The question then becomes what can we do to improve our development  
process?

> I think major releases should be done only when a major feature  
> truly "forces" us to (which Java 1.5 has) and not because we want to  
> clean out the accumulated cruft we are carrying forward to preserve  
> backwards compatibility.
>
>> As for the other issue concerning things like token issues, I think  
>> it is reasonable to fix the bug and just let people know it will  
>> change indexing, but try to allow for the old way if it is not to  
>> onerous.  Chances are most people aren't even aware of it, and thus  
>> telling them about may actually cause them to consider it.  For  
>> things like maxFieldLength, etc. then back compat. is a reasonable  
>> thing to preserve.
>
> So, in hindsight, the acronym/host setting for StandardAnalyzer  
> really should have defaulted to "true", meaning the bug is fixed,  
> but users who somehow depend on the bug (which should be a tiny  
> minority) have an avenue (setReplaceInvalidAcronym) to keep back  
> compatibility if needed even on a minor release, right?  I agree.   
> (And so in 2.4 we should fix the default to true?).


>
>
> I think for such issues where it's a very minor break in backwards  
> compatibility, we should make the break, and very carefully document  
> this in the "Changes in runtime behavior" section, even within a  
> minor release.  I don't think such changes should drive us to a  
> major release.


+1

-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

Grant Ingersoll wrote:

> Yes, I agree these are what is about (despite the divergence into  
> locking).
>
> As I see, it the question is about whether we should try to do  
> major releases on the order of a year, rather than the current 2+  
> year schedule and also how to best handle bad behavior when  
> producing tokens that previous applications rely on.
>
> On the first case, we said we would try to do minor releases more  
> frequently (on the order of once a quarter) in the past, but this,  
> so far hasn't happened.   However, it has only been one release,  
> and it did have a lot of big changes that warranted longer  
> testing.  I do agree with Michael M. that we have done a good job  
> of keeping back compatibility.  I still don't know if trying to  
> clean out deprecations once a year puts some onerous task on people  
> when it comes to upgrading as opposed to doing every two years.  Do  
> people really have code that they never compile or work on in over  
> a year?  If they do, do they care about upgrading?  It clearly  
> means they are happy w/ Lucene and don't need any bug fixes.  I can  
> understand this being a bigger issue if it were on the order of  
> every 6 months or less, but that isn't what I am proposing.  I  
> guess my suggestion would be that we try to get back onto the once  
> a quarter release goal, which will more than likely lead to a major  
> release in the 1-1.5 year time frame.  That being said, I am fine  
> with maintaining the status quo concerning back. compatibility as I  
> think those arguments are compelling.  On the interface thing, I  
> wish there was a @introducing annotation that could announce the  
> presence of a new method and would give a warning up until the  
> version specified is met, at which point it would break the  
> compile, but I realize the semantics of that are pretty weird, so...

I do think we should try for minor releases more frequently,  
independent of the backwards compatibility question (how often to do  
major releases) :)

I think major releases should be done only when a major feature truly  
"forces" us to (which Java 1.5 has) and not because we want to clean  
out the accumulated cruft we are carrying forward to preserve  
backwards compatibility.

> As for the other issue concerning things like token issues, I think  
> it is reasonable to fix the bug and just let people know it will  
> change indexing, but try to allow for the old way if it is not to  
> onerous.  Chances are most people aren't even aware of it, and thus  
> telling them about may actually cause them to consider it.  For  
> things like maxFieldLength, etc. then back compat. is a reasonable  
> thing to preserve.

So, in hindsight, the acronym/host setting for StandardAnalyzer  
really should have defaulted to "true", meaning the bug is fixed, but  
users who somehow depend on the bug (which should be a tiny minority)  
have an avenue (setReplaceInvalidAcronym) to keep back compatibility  
if needed even on a minor release, right?  I agree.  (And so in 2.4  
we should fix the default to true?).

I think for such issues where it's a very minor break in backwards  
compatibility, we should make the break, and very carefully document  
this in the "Changes in runtime behavior" section, even within a  
minor release.  I don't think such changes should drive us to a major  
release.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Grant Ingersoll <gs...@apache.org>.

Yes, I agree these are what is about (despite the divergence into  
locking).

As I see, it the question is about whether we should try to do major  
releases on the order of a year, rather than the current 2+ year  
schedule and also how to best handle bad behavior when producing  
tokens that previous applications rely on.

On the first case, we said we would try to do minor releases more  
frequently (on the order of once a quarter) in the past, but this, so  
far hasn't happened.   However, it has only been one release, and it  
did have a lot of big changes that warranted longer testing.  I do  
agree with Michael M. that we have done a good job of keeping back  
compatibility.  I still don't know if trying to clean out deprecations  
once a year puts some onerous task on people when it comes to  
upgrading as opposed to doing every two years.  Do people really have  
code that they never compile or work on in over a year?  If they do,  
do they care about upgrading?  It clearly means they are happy w/  
Lucene and don't need any bug fixes.  I can understand this being a  
bigger issue if it were on the order of every 6 months or less, but  
that isn't what I am proposing.  I guess my suggestion would be that  
we try to get back onto the once a quarter release goal, which will  
more than likely lead to a major release in the 1-1.5 year time  
frame.  That being said, I am fine with maintaining the status quo  
concerning back. compatibility as I think those arguments are  
compelling.  On the interface thing, I wish there was a @introducing  
annotation that could announce the presence of a new method and would  
give a warning up until the version specified is met, at which point  
it would break the compile, but I realize the semantics of that are  
pretty weird, so...

As for the other issue concerning things like token issues, I think it  
is reasonable to fix the bug and just let people know it will change  
indexing, but try to allow for the old way if it is not to onerous.   
Chances are most people aren't even aware of it, and thus telling them  
about may actually cause them to consider it.  For things like  
maxFieldLength, etc. then back compat. is a reasonable thing to  
preserve.

Cheers,
Grant


On Jan 23, 2008, at 6:24 PM, DM Smith wrote:

> Top posting because this is a response to the thread as a whole.
>
> It appears that this thread has identified some different reasons  
> for "needing" to break compatibility:
> 1) A current behavior is now deemed bad or wrong. Examples: the  
> silent truncation of large documents or an analyzer that works  
> incorrectly.
> 2) Performance tuning such as seen in Token, allowing reuse.
> 3) Support of a new language feature, e.g. generics, that make the  
> code "better".
> 4) A new feature requires a change to the existing API.
>
> Perhaps there were others? Maybe specifics are in Jira.
>
> It seems to me that the Lucene developers have done an excellent job  
> at figuring out how to maintain compatibility. This is a testament  
> to how well grounded the design of the API actually is, from early  
> on and even now. And changes seem to be well thought out, well  
> designed and carefully implemented.
>
> I think that when it really gets down to it, the Lucene API will  
> stay very stable because of this.
>
> On a side note, the cLucene project seems to be languishing (still  
> trying to get to 2.0) and any stability of the API is a good thing  
> for it. And perhaps for the other "ports" as well.
>
> Again many thanks for all your hard work,
> 	DM Smith, a thankful "parasite" :)
>
> On Jan 23, 2008, at 5:16 PM, Michael McCandless wrote:
>
>>
>> chris Hostetter wrote:
>>
>>>
>>> : I do like the idea of a static/system property to match legacy
>>> : behavior.  For example, the bugs around how StandardTokenizer
>>> : mislabels tokens (eg LUCENE-1100), this would be the perfect  
>>> solution.
>>> : Clearly those are silly bugs that should be fixed, quickly, with  
>>> this
>>> : back-compatible mode to keep the bug in place.
>>> :
>>> : We might want to, instead, have ctors for many classes take a  
>>> required
>>> : arg which states the version of Lucene you are using?  So if you  
>>> are
>>> : writing a new app you would pass in the current version.  Then, on
>>> : dropping in a future Lucene JAR, we could use that arg to  
>>> enforce the
>>> : right backwards compatibility.  This would save users from  
>>> having to
>>> : realize they are hitting one of these situations and then know  
>>> to go
>>> : set the right static/property to retain the buggy behavior.
>>>
>>> I'm not sure that this would be better though ... when i write my  
>>> code, i
>>> pass "2.3" to all these constructors (or factory methods) and then  
>>> later i
>>> want to upgrade to 2.3 to get all the new performance goodness ... i
>>> shouldn't have to change all those constructor calls to get all  
>>> the 2.4
>>> goodness, i should be able to leave my code as is -- but if i do  
>>> that,
>>> then i might not get all the 2.4 goodness, (like improved
>>> tokenization, or more precise segment merging) because some of that
>>> goodness violates previous assumptions that some code might have  
>>> had ...
>>> my code doesn't have those assumptions, i know nothing about them,  
>>> i'll
>>> take whatever behavior the Lucene Developers recommend (unless i see
>>> evidence that it breaks something, in which case i'll happily set a
>>> system property or something that the release notes say will force  
>>> the
>>> old behavior.
>>>
>>> The basic principle being: by default, give users the behavior  
>>> that is
>>> generally viewed as "correct" -- but give them the option to force
>>> "uncorrect" legacy behavior.
>>
>> OK, I agree: the vast majority of users upgrading would in fact  
>> want all of the changes in the new release.  And then the rare user  
>> who is affected by that bug fix to StandardTokenizer would have to  
>> set the compatibility mode.  So it makes sense for you to get all  
>> changes on upgrading (and NOT specify the legacy version in all  
>> ctors).
>>
>>> : Also, backporting is extremely costly over time.  I'd much  
>>> rather keep
>>> : compatibility for longer on our forward releases, than spend our
>>> : scarce resources moving changes back.
>>>
>>> +1
>>>
>>> : So to summarize ... I think we should have (keep) a high  
>>> tolerance for
>>> : cruft to maintain API compatibility.  I think our current approach
>>> : (try hard to keep compatibility during "minor" releases, then
>>> : deprecate, then remove APIs on a major release; do major  
>>> releases only
>>> : when truly required) is a good one.
>>>
>>> i'm with you for the most part, it's just the defintion of "when  
>>> truly
>>> required" that tends to hang people up ... there's a chicken vs egg
>>> problem of deciding wether the code should drive what the next  
>>> release
>>> number is: "i've added a bitch'n feature but it requires adding a  
>>> method
>>> to an interface, therefor the next release must be called 4.0" ...  
>>> vs the
>>> mindset that "we just had a 3.0 release, it's too soon for another  
>>> major
>>> release, the next release should be called 3.1, so we need to hold  
>>> off on
>>> commiting non backwards compatible changes for a while."
>>>
>>> I'm in the first camp: version numbers should be descriptive,  
>>> information
>>> carrying, labels for releases -- but the version number of a release
>>> should be dicated by the code contained in that release.  (if that  
>>> means
>>> the next version after 3.0.0 is 4.0.0, then so be it.)
>>
>> Well, I am weary of doing major releases too often.  Though I do  
>> agree that the version number should be a "fastmatch" for reading  
>> through CHANGES.txt.
>>
>> Say we do this, and zoom forward 2 years when we're up to 6.0, then  
>> poor users stuck on 1.9 will dread upgrading, but probably shouldn't.
>>
>> One of the amazing things about Lucene, to me, is how many really  
>> major changes we have been able to make while not in fact breaking  
>> backwards compatibility (too much).  Being very careful not to make  
>> things public, intentionally not committing to things like exactly  
>> when does a flush or commit or merge actually happen, marking new  
>> APIs as experimental and freely subject to change, using abstract  
>> classes not interfaces, are all wonderful tools that Lucene employs  
>> (and should continue to do so), to enable sizable changes in the  
>> future while keeping backwards compatibility.
>>
>> Allowing for future backwards compatibility is one of the most  
>> important things we all do when we make changes to Lucene!
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by DM Smith <dm...@gmail.com>.

Top posting because this is a response to the thread as a whole.

It appears that this thread has identified some different reasons for  
"needing" to break compatibility:
1) A current behavior is now deemed bad or wrong. Examples: the silent  
truncation of large documents or an analyzer that works incorrectly.
2) Performance tuning such as seen in Token, allowing reuse.
3) Support of a new language feature, e.g. generics, that make the  
code "better".
4) A new feature requires a change to the existing API.

Perhaps there were others? Maybe specifics are in Jira.

It seems to me that the Lucene developers have done an excellent job  
at figuring out how to maintain compatibility. This is a testament to  
how well grounded the design of the API actually is, from early on and  
even now. And changes seem to be well thought out, well designed and  
carefully implemented.

I think that when it really gets down to it, the Lucene API will stay  
very stable because of this.

On a side note, the cLucene project seems to be languishing (still  
trying to get to 2.0) and any stability of the API is a good thing for  
it. And perhaps for the other "ports" as well.

Again many thanks for all your hard work,
	DM Smith, a thankful "parasite" :)

On Jan 23, 2008, at 5:16 PM, Michael McCandless wrote:

>
> chris Hostetter wrote:
>
>>
>> : I do like the idea of a static/system property to match legacy
>> : behavior.  For example, the bugs around how StandardTokenizer
>> : mislabels tokens (eg LUCENE-1100), this would be the perfect  
>> solution.
>> : Clearly those are silly bugs that should be fixed, quickly, with  
>> this
>> : back-compatible mode to keep the bug in place.
>> :
>> : We might want to, instead, have ctors for many classes take a  
>> required
>> : arg which states the version of Lucene you are using?  So if you  
>> are
>> : writing a new app you would pass in the current version.  Then, on
>> : dropping in a future Lucene JAR, we could use that arg to enforce  
>> the
>> : right backwards compatibility.  This would save users from having  
>> to
>> : realize they are hitting one of these situations and then know to  
>> go
>> : set the right static/property to retain the buggy behavior.
>>
>> I'm not sure that this would be better though ... when i write my  
>> code, i
>> pass "2.3" to all these constructors (or factory methods) and then  
>> later i
>> want to upgrade to 2.3 to get all the new performance goodness ... i
>> shouldn't have to change all those constructor calls to get all the  
>> 2.4
>> goodness, i should be able to leave my code as is -- but if i do  
>> that,
>> then i might not get all the 2.4 goodness, (like improved
>> tokenization, or more precise segment merging) because some of that
>> goodness violates previous assumptions that some code might have  
>> had ...
>> my code doesn't have those assumptions, i know nothing about them,  
>> i'll
>> take whatever behavior the Lucene Developers recommend (unless i see
>> evidence that it breaks something, in which case i'll happily set a
>> system property or something that the release notes say will force  
>> the
>> old behavior.
>>
>> The basic principle being: by default, give users the behavior that  
>> is
>> generally viewed as "correct" -- but give them the option to force
>> "uncorrect" legacy behavior.
>
> OK, I agree: the vast majority of users upgrading would in fact want  
> all of the changes in the new release.  And then the rare user who  
> is affected by that bug fix to StandardTokenizer would have to set  
> the compatibility mode.  So it makes sense for you to get all  
> changes on upgrading (and NOT specify the legacy version in all  
> ctors).
>
>> : Also, backporting is extremely costly over time.  I'd much rather  
>> keep
>> : compatibility for longer on our forward releases, than spend our
>> : scarce resources moving changes back.
>>
>> +1
>>
>> : So to summarize ... I think we should have (keep) a high  
>> tolerance for
>> : cruft to maintain API compatibility.  I think our current approach
>> : (try hard to keep compatibility during "minor" releases, then
>> : deprecate, then remove APIs on a major release; do major releases  
>> only
>> : when truly required) is a good one.
>>
>> i'm with you for the most part, it's just the defintion of "when  
>> truly
>> required" that tends to hang people up ... there's a chicken vs egg
>> problem of deciding wether the code should drive what the next  
>> release
>> number is: "i've added a bitch'n feature but it requires adding a  
>> method
>> to an interface, therefor the next release must be called 4.0" ...  
>> vs the
>> mindset that "we just had a 3.0 release, it's too soon for another  
>> major
>> release, the next release should be called 3.1, so we need to hold  
>> off on
>> commiting non backwards compatible changes for a while."
>>
>> I'm in the first camp: version numbers should be descriptive,  
>> information
>> carrying, labels for releases -- but the version number of a release
>> should be dicated by the code contained in that release.  (if that  
>> means
>> the next version after 3.0.0 is 4.0.0, then so be it.)
>
> Well, I am weary of doing major releases too often.  Though I do  
> agree that the version number should be a "fastmatch" for reading  
> through CHANGES.txt.
>
> Say we do this, and zoom forward 2 years when we're up to 6.0, then  
> poor users stuck on 1.9 will dread upgrading, but probably shouldn't.
>
> One of the amazing things about Lucene, to me, is how many really  
> major changes we have been able to make while not in fact breaking  
> backwards compatibility (too much).  Being very careful not to make  
> things public, intentionally not committing to things like exactly  
> when does a flush or commit or merge actually happen, marking new  
> APIs as experimental and freely subject to change, using abstract  
> classes not interfaces, are all wonderful tools that Lucene employs  
> (and should continue to do so), to enable sizable changes in the  
> future while keeping backwards compatibility.
>
> Allowing for future backwards compatibility is one of the most  
> important things we all do when we make changes to Lucene!
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

chris Hostetter wrote:

>
> : I do like the idea of a static/system property to match legacy
> : behavior.  For example, the bugs around how StandardTokenizer
> : mislabels tokens (eg LUCENE-1100), this would be the perfect  
> solution.
> : Clearly those are silly bugs that should be fixed, quickly, with  
> this
> : back-compatible mode to keep the bug in place.
> :
> : We might want to, instead, have ctors for many classes take a  
> required
> : arg which states the version of Lucene you are using?  So if you are
> : writing a new app you would pass in the current version.  Then, on
> : dropping in a future Lucene JAR, we could use that arg to enforce  
> the
> : right backwards compatibility.  This would save users from having to
> : realize they are hitting one of these situations and then know to go
> : set the right static/property to retain the buggy behavior.
>
> I'm not sure that this would be better though ... when i write my  
> code, i
> pass "2.3" to all these constructors (or factory methods) and then  
> later i
> want to upgrade to 2.3 to get all the new performance goodness ... i
> shouldn't have to change all those constructor calls to get all the  
> 2.4
> goodness, i should be able to leave my code as is -- but if i do that,
> then i might not get all the 2.4 goodness, (like improved
> tokenization, or more precise segment merging) because some of that
> goodness violates previous assumptions that some code might have  
> had ...
> my code doesn't have those assumptions, i know nothing about them,  
> i'll
> take whatever behavior the Lucene Developers recommend (unless i see
> evidence that it breaks something, in which case i'll happily set a
> system property or something that the release notes say will force the
> old behavior.
>
> The basic principle being: by default, give users the behavior that is
> generally viewed as "correct" -- but give them the option to force
> "uncorrect" legacy behavior.

OK, I agree: the vast majority of users upgrading would in fact want  
all of the changes in the new release.  And then the rare user who is  
affected by that bug fix to StandardTokenizer would have to set the  
compatibility mode.  So it makes sense for you to get all changes on  
upgrading (and NOT specify the legacy version in all ctors).

> : Also, backporting is extremely costly over time.  I'd much rather  
> keep
> : compatibility for longer on our forward releases, than spend our
> : scarce resources moving changes back.
>
> +1
>
> : So to summarize ... I think we should have (keep) a high  
> tolerance for
> : cruft to maintain API compatibility.  I think our current approach
> : (try hard to keep compatibility during "minor" releases, then
> : deprecate, then remove APIs on a major release; do major releases  
> only
> : when truly required) is a good one.
>
> i'm with you for the most part, it's just the defintion of "when truly
> required" that tends to hang people up ... there's a chicken vs egg
> problem of deciding wether the code should drive what the next release
> number is: "i've added a bitch'n feature but it requires adding a  
> method
> to an interface, therefor the next release must be called 4.0" ...  
> vs the
> mindset that "we just had a 3.0 release, it's too soon for another  
> major
> release, the next release should be called 3.1, so we need to hold  
> off on
> commiting non backwards compatible changes for a while."
>
> I'm in the first camp: version numbers should be descriptive,  
> information
> carrying, labels for releases -- but the version number of a release
> should be dicated by the code contained in that release.  (if that  
> means
> the next version after 3.0.0 is 4.0.0, then so be it.)

Well, I am weary of doing major releases too often.  Though I do  
agree that the version number should be a "fastmatch" for reading  
through CHANGES.txt.

Say we do this, and zoom forward 2 years when we're up to 6.0, then  
poor users stuck on 1.9 will dread upgrading, but probably shouldn't.

One of the amazing things about Lucene, to me, is how many really  
major changes we have been able to make while not in fact breaking  
backwards compatibility (too much).  Being very careful not to make  
things public, intentionally not committing to things like exactly  
when does a flush or commit or merge actually happen, marking new  
APIs as experimental and freely subject to change, using abstract  
classes not interfaces, are all wonderful tools that Lucene employs  
(and should continue to do so), to enable sizable changes in the  
future while keeping backwards compatibility.

Allowing for future backwards compatibility is one of the most  
important things we all do when we make changes to Lucene!

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Chris Hostetter <ho...@fucit.org>.

: I do like the idea of a static/system property to match legacy
: behavior.  For example, the bugs around how StandardTokenizer
: mislabels tokens (eg LUCENE-1100), this would be the perfect solution.
: Clearly those are silly bugs that should be fixed, quickly, with this
: back-compatible mode to keep the bug in place.
: 
: We might want to, instead, have ctors for many classes take a required
: arg which states the version of Lucene you are using?  So if you are
: writing a new app you would pass in the current version.  Then, on
: dropping in a future Lucene JAR, we could use that arg to enforce the
: right backwards compatibility.  This would save users from having to
: realize they are hitting one of these situations and then know to go
: set the right static/property to retain the buggy behavior.

I'm not sure that this would be better though ... when i write my code, i 
pass "2.3" to all these constructors (or factory methods) and then later i 
want to upgrade to 2.3 to get all the new performance goodness ... i 
shouldn't have to change all those constructor calls to get all the 2.4 
goodness, i should be able to leave my code as is -- but if i do that, 
then i might not get all the 2.4 goodness, (like improved 
tokenization, or more precise segment merging) because some of that 
goodness violates previous assumptions that some code might have had ... 
my code doesn't have those assumptions, i know nothing about them, i'll 
take whatever behavior the Lucene Developers recommend (unless i see 
evidence that it breaks something, in which case i'll happily set a 
system property or something that the release notes say will force the 
old behavior.

The basic principle being: by default, give users the behavior that is 
generally viewed as "correct" -- but give them the option to force 
"uncorrect" legacy behavior.

: Also, backporting is extremely costly over time.  I'd much rather keep
: compatibility for longer on our forward releases, than spend our
: scarce resources moving changes back.

+1

: So to summarize ... I think we should have (keep) a high tolerance for
: cruft to maintain API compatibility.  I think our current approach
: (try hard to keep compatibility during "minor" releases, then
: deprecate, then remove APIs on a major release; do major releases only
: when truly required) is a good one.

i'm with you for the most part, it's just the defintion of "when truly 
required" that tends to hang people up ... there's a chicken vs egg 
problem of deciding wether the code should drive what the next release 
number is: "i've added a bitch'n feature but it requires adding a method 
to an interface, therefor the next release must be called 4.0" ... vs the 
mindset that "we just had a 3.0 release, it's too soon for another major 
release, the next release should be called 3.1, so we need to hold off on 
commiting non backwards compatible changes for a while."

I'm in the first camp: version numbers should be descriptive, information 
carrying, labels for releases -- but the version number of a release 
should be dicated by the code contained in that release.  (if that means 
the next version after 3.0.0 is 4.0.0, then so be it.)


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Michael McCandless <lu...@mikemccandless.com>.

Catching up here...

Re the fracturing when Maven went from v1 -> v2: I think Lucene is a
totally different animal.  Maven is an immense framework; Lucene is a
fairly small "core" set of APIs.  I think for these "core" type
packages it's very important to keep drop-in compatibility as long as
possible.

I think we _really_ want our users to upgrade.  Yes, there are alot of
A people who will forever be stuck in the past, but let's not make
barriers for them to switch to class C, or for class C to upgrade.
When someone is running old versions of Lucene it only hurts their (&
their friends & their users) perception of Lucene.

I think we've done a good job keeping backwards compatibility despite
some rather major recent changes:

   * We now do segment merging in a BG thread

   * We now flush by RAM (16 MB default) not at 10 buffered docs

   * Merge selection is based on size of segment in bytes not doc count

   * We will (in 2.4) "autoCommit" far less often (LUCENE-1044)

Now, we could have forced these into a major release instead, but, I
don't think we should have.  As much as possible I think we should
keep on minor releases (keep backwards compatibility) so people can
always more easily upgrade.

As far as I know, the only solid reason for 3.0 is the
non-backwards-compatible switch to Java 1.5?

I do like the idea of a static/system property to match legacy
behavior.  For example, the bugs around how StandardTokenizer
mislabels tokens (eg LUCENE-1100), this would be the perfect solution.
Clearly those are silly bugs that should be fixed, quickly, with this
back-compatible mode to keep the bug in place.

We might want to, instead, have ctors for many classes take a required
arg which states the version of Lucene you are using?  So if you are
writing a new app you would pass in the current version.  Then, on
dropping in a future Lucene JAR, we could use that arg to enforce the
right backwards compatibility.  This would save users from having to
realize they are hitting one of these situations and then know to go
set the right static/property to retain the buggy behavior.

Also, backporting is extremely costly over time.  I'd much rather keep
compatibility for longer on our forward releases, than spend our
scarce resources moving changes back.

So to summarize ... I think we should have (keep) a high tolerance for
cruft to maintain API compatibility.  I think our current approach
(try hard to keep compatibility during "minor" releases, then
deprecate, then remove APIs on a major release; do major releases only
when truly required) is a good one.

Mike

Chris Hostetter wrote:

>
> : To paraphrase a dead English guy: A rose by any other name is  
> still the same,
> : right?
> :
> : Basically, all the version number tick saves them from is having  
> to read the
> : CHANGES file, right?
>
> Correct: i'm not disagreeing with your basic premise, just pointing  
> out
> that it can be done with the current model, and that predicable  
> "version
> identifiers" are a good idea when dealing with backwards  
> compatibility.
>
> : Thus, the version numbers become meaningless; the question is  
> what do we see
> : as best for Lucene?  We could just as easily call it Lucene  
> Summer '08 and
> : Lucene Winter '08.  Heck, we could pull the old MS Word 2.0 to MS  
> Word 6.0 and
>
> well .. i would argue that with what you hpothozied *then* version  
> numbers
> would becoming meaningless ... having 3.0, 3.1, 3.2, 4.0 would be no
> differnet then having 3, 4, 5, 6 -- our version numbers would be
> identifiers with no other context ... i'm just saying we should  
> keep the
> context in so that you know whether or not version X is backwards
> compatible with version Y.
>
> Which is not to say that we shouldn't hcange our version number  
> format...
>
> Ie: we could start using quad-tuple version numbers: 3.2.5.0  
> instead of 3.5.0
>
>    3: major version #
>       identifies file format back compatibility (as today)
>    2: api compat version #
>       classes/methods may be removed when this changes
>    5: minor version #
>       new methods may be added when this changes (as today)
>    0: patch version #
>       changes only when there are serious bug fixes
>
> ...that might mean that our version numbers go...
>
> 3.0.0.0
> 3.0.1.0
> 3.1.0.0
> 3.1.1.0
> 3.1.2.0
> 3.2.0.0
>
> ...where most numbers never get above "2" but at least the version  
> number
> conveys useful compatibility information (at no added developer  
> "cost")
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Chris Hostetter <ho...@fucit.org>.

: To paraphrase a dead English guy: A rose by any other name is still the same,
: right?
: 
: Basically, all the version number tick saves them from is having to read the
: CHANGES file, right?

Correct: i'm not disagreeing with your basic premise, just pointing out 
that it can be done with the current model, and that predicable "version 
identifiers" are a good idea when dealing with backwards compatibility.

: Thus, the version numbers become meaningless; the question is what do we see
: as best for Lucene?  We could just as easily call it Lucene Summer '08 and
: Lucene Winter '08.  Heck, we could pull the old MS Word 2.0 to MS Word 6.0 and
  
well .. i would argue that with what you hpothozied *then* version numbers 
would becoming meaningless ... having 3.0, 3.1, 3.2, 4.0 would be no 
differnet then having 3, 4, 5, 6 -- our version numbers would be 
identifiers with no other context ... i'm just saying we should keep the 
context in so that you know whether or not version X is backwards 
compatible with version Y.

Which is not to say that we shouldn't hcange our version number format...

Ie: we could start using quad-tuple version numbers: 3.2.5.0 instead of 3.5.0

   3: major version #
      identifies file format back compatibility (as today)
   2: api compat version #
      classes/methods may be removed when this changes
   5: minor version #
      new methods may be added when this changes (as today)
   0: patch version #
      changes only when there are serious bug fixes

...that might mean that our version numbers go...

3.0.0.0
3.0.1.0
3.1.0.0
3.1.1.0
3.1.2.0
3.2.0.0

...where most numbers never get above "2" but at least the version number 
conveys useful compatibility information (at no added developer "cost")



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Grant Ingersoll <gs...@apache.org>.

On Jan 22, 2008, at 3:45 PM, Chris Hostetter wrote:
>
> Perhaps the crux of the issue is that we as a community need to become
> more willing to crank out "major" releases ... if we just released  
> 3.0 and
> now someone came up with the "Magic" field type and it's really  
> magically
> and we want to start using it but it's not backwards compatibly --  
> well i
> guess are next release just needs to be called 4.0 then ... it's clear
> from the version number that this is a significant change, evne if  
> it does
> wind up getting released 3 months after v3.0

To paraphrase a dead English guy: A rose by any other name is still  
the same, right?

Basically, all the version number tick saves them from is having to  
read the CHANGES file, right?

To some extent, I am proposing that we clean out the cruft once a  
year.  Consider it spring cleaning.  If we want to mark it as a major  
version, I am fine with that.  Basically, if history is any  
indication, this would mean our releases will look like (given our  
avg. 6 mos release cycle):
3.0
3.1
4.0
4.1
5.0
5.1

Thus, the version numbers become meaningless; the question is what do  
we see as best for Lucene?  We could just as easily call it Lucene  
Summer '08 and Lucene Winter '08.  Heck, we could pull the old MS Word  
2.0 to MS Word 6.0 and jump to Lucene 6.0 next, too, for all I care.   
I think 1 year is plenty long to keep both user Group B and C happy (A  
will be oblivious).    Once a year cleanup of code, in my mind is not  
too burdensome for those in Group C.  I consider myself in Group C,  
for most tools I use (Lucene is probably the exception) and I can't  
recall the last time I had an application that uses Lucene like stuff  
that I haven't touched in over a year.  But even for those other  
tools, I expect that I am going to have a major upgrade once a year.  
In fact, it is often part of the license agreement from commercial  
companies and I would feel cheated if I didn't get it.

I think one could even argue that Group C would be happier w/ more  
frequent removals of cruft, since it can be handled in a more  
incremental way, versus an all at once upgrade every 2 years.  They  
have the option of how big of a chunk to bite off.

-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Chris Hostetter <ho...@fucit.org>.

: I guess I am suggesting that instead of maintaining the whole major/minor
: thing (not including file format) that we relax a bit and say that any give
: feature we choose to remove or add has to go through two release cycles, which
: according to your averages, would equal just over 1 year's time.  If people
: can't adapt to a coming change that was announced a year ago, then I have to
: wonder why they need to upgrade at all. (OK, that is a little strong, but...)

as someone else pointed out somewhere in this thread (reading it all at 
once i'm lossing track) this will make it harder for people to understand 
how much effort it will be to go from version 3.4 to 3.5 ... is that a 
drop in replacement?

Perhaps the crux of the issue is that we as a community need to become 
more willing to crank out "major" releases ... if we just released 3.0 and 
now someone came up with the "Magic" field type and it's really magically 
and we want to start using it but it's not backwards compatibly -- well i 
guess are next release just needs to be called 4.0 then ... it's clear 
from the version number that this is a significant change, evne if it does 
wind up getting released 3 months after v3.0

: And again, I still think the more pertinent issue that needs to be addressed
: is how to better handle bugs in things like Tokenization, etc. where people
: may have dependencies on broken functionality, but haven't fully comprehended
: that they have such a dependency.  I don't think those fixes/deprecations
: should have to wait for a major release.

I think situations like this are the one place where using system 
properties to force broken/legacy behavior would really make sense ... we 
fix the code so all "new" users get the correct/better behavior, and we 
document in the CHANGES.txt that the behavior has changed.  the code is 
drop in compatile for anyone who isn't relying on broken behavior, and if 
you are you can set a system proberty to foce the old behavior.
(caveat: to support the few cases people have mentioned where you can't 
set system properties easily (applets i think?) a static method should be 
provided as well, so if you need old broken behavior *AND* you can't use 
system properties you just have to add one line of code to your app)


: Does anyone have experience w/ how other open source projects deal with this?

Poorly.

The best solution I've seen is to support multiple "stable" branches.  
we've talked about doing that before, but there haven't been any features 
anyone has steped up to backport to an older version since that 
discussion. (probably because we've done such a good job of making it easy 
for people to upgrade)

As i mentioned elsewhere in this thread: i worry about the community 
fragementing if we raise the bar on upgrading in order to lower the bar on 
development ... having multiple "stable" branches seems like it could also 
fragment the community very easily... people using 3.2.X releases not 
being able to interact/help with people using 2.4.Y on the user list 
because certain things work drasitcly differnetly.

backporting bug fixes is one thing, but i'm leary of backporting new 
features and performance improvements (not that i would object to anyone 
doing so ... i'm just scared of where it might lead)


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Back Compatibility

Posted by Grant Ingersoll <gs...@apache.org>.

I guess I am suggesting that instead of maintaining the whole major/ 
minor thing (not including file format) that we relax a bit and say  
that any give feature we choose to remove or add has to go through two  
release cycles, which according to your averages, would equal just  
over 1 year's time.  If people can't adapt to a coming change that was  
announced a year ago, then I have to wonder why they need to upgrade  
at all. (OK, that is a little strong, but...)

And mind you, I am not set in stone about this, I'm just starting the  
conversation, being a bit of a devils advocate.  I would like to know  
if there is a way we can retain Lucene's solid stability and maturity  
and encourage new and better features and not have to maintain so much  
deprecated code.

I especially think this shorter cycle is useful when it comes to  
deprecated features (not that there is much difference between  
removing methods and adding methods.)  Basically, we have code in  
Lucene that we all agree should not be used, yet we leave it sitting  
in there for, on average 2 years or more.  Dropping that to be 1 year,  
on average, is not going to all of a sudden break everyone and make  
Lucene unstable.  Besides, if people like the deprecated methods so  
much, why upgrade in the first place?  No one is forcing them.   
Usually, the answer is there is some new feature somewhere else that  
is needed, but that just shows that people are willing to invest the  
time to get better features in the first place.  Besides, just because  
we can remove things, doesn't mean we have to remove them.  For  
instance, some bigger features that we improve, we may want to  
deprecate for more than one full release cycle.

Fieldable is, in my mind a prime example of needing the ability to  
announce additions.  Let's say we come up with some new-fangled Field  
type called Magic.  This thing is so beautiful we all wonder how we  
ever lived without it.  Great.  Now all we need to do is add an  
isMagic() declaration onto Fieldable and we're good.  Oops, can't do  
that.  Gotta wait 2 more years for 4.0.  Seriously, we have to be  
locked into an interface for 2 years or more?   And oh by the way, in  
that 2 years, Lucene has been left in the dust b/c every other open  
source search engine out there already has Magic capabilities.   
Furthermore, that gives us one 6 month window (+/-) to get it right  
for the next 2 years.  I know, it's a bit over the top, but I think it  
demonstrates the point.  I also don't see what is unstable about  
telling the community, well in advance, that the following API changes  
are coming, please plan accordingly.  Most projects out there don't  
even do that.  Seriously, in Maven you get updates to functionality  
without even knowing you are getting them and I am no where near  
advocating for that.

And again, I still think the more pertinent issue that needs to be  
addressed is how to better handle bugs in things like Tokenization,  
etc. where people may have dependencies on broken functionality, but  
haven't fully comprehended that they have such a dependency.  I don't  
think those fixes/deprecations should have to wait for a major release.

Does anyone have experience w/ how other open source projects deal  
with this?  Do they have explicit policies?  Is it ad hoc?  I know for  
instance, in a certain library I was using that the difference between  
what was announced as a beta and as 1.0 was quite different.  Granted,  
one could expect that a bit more out of something that was going from  
0.X to 1.0, but still it was a pretty significant, more or less  
unannounced change (unless you count following all the commit messages  
as announcement) and it will require a decent amount of work to upgrade.

-Grant

On Jan 17, 2008, at 6:58 PM, Steven A Rowe wrote:

> Hi Grant,
>
> On 01/17/2008 at 7:51 AM, Grant Ingersoll wrote:
>> Our minor release cycles are currently in the 3-6 months range
>> and our major release cycles are in the 1-1.5 year range.
>
> Since 2.0.0, including 2.3.0 - assuming it will be released in the  
> next week or so - the minor release intervals will have averaged  
> about 6.5 months, over three releases.
>
> Historically, the major release cycle intervals have roughly been:
>
>   1.0   - 6 months (March 2000 - October 2000)
>   2.0.0 - 6 years  (October 2000 - May 2006)
>
> Six years is an incredibly long time to maintain backward  
> compatibility.
>
> Assuming there will be a 2.4 release, and then 3.0 following it,  
> it's pretty optimistic (IMHO) to think that it will be released  
> before June 2008, so for 3.0, that would be:
>
>   3.0.0 - 2 years (May 2006 - May 2008)
>
> Two years doesn't seem so long in comparison :).
>
>> I think giving someone 4-8 (or whatever) months is more than
>> enough time to prepare for API changes.   I am not sure how
>> this would effect Index changes, but I do think we should
>> KEEP our current index reading policy where possible.  This
>> may mean that some deprecated items cannot be removed until
>> a major release and I think that is fine.
>
> Given the 6.5 month average minor release interval for the most  
> recent major release, and the relatively low probability that this  
> will shrink appreciably, you seem in essense to be advocating  
> altogether abandoning backward API compatibility from one (minor)  
> release to the next.
>
> However, below you are advocating a minimum of one "test balloon"  
> release between incompatible changes:
>
> On 01/17/2008 at 3:41 PM, Grant Ingersoll wrote:
>> [N]o interface/deprecation changes would be done without announcing  
>> it
>> and there being at least one release in the meantime.  Thus, if we
>> wanted to add isFancySchmancy() onto Fieldable today, it would have  
>> to
>> be announced, patch provided and referenced, a release without it  
>> (i.e.
>> 2.3) and then it would be available in 2.4.  By ad-hoc, I meant  
>> that we
>> wouldn't just announce it and then have it show up in 2.3 and not  
>> give
>> people time to digest it.
>
> If I understand you correctly, a major release series could contain  
> a whole series of non-aligned overlapping back-incompatible changes,  
> since you are allowing individual features to alter backward  
> incompatibility independently of other features.  I think this is  
> actually worse than just abandoning back-compatibility, since users  
> would have to look up information on each individual feature to be  
> able to figure out whether they can do a drop-in upgrade.
>
> Steve
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

RE: Back Compatibility

Posted by Steven A Rowe <sa...@syr.edu>.

Hi Grant,

On 01/17/2008 at 7:51 AM, Grant Ingersoll wrote:
> Our minor release cycles are currently in the 3-6 months range
> and our major release cycles are in the 1-1.5 year range.

Since 2.0.0, including 2.3.0 - assuming it will be released in the next week or so - the minor release intervals will have averaged about 6.5 months, over three releases.

Historically, the major release cycle intervals have roughly been:

   1.0   - 6 months (March 2000 - October 2000)
   2.0.0 - 6 years  (October 2000 - May 2006)

Six years is an incredibly long time to maintain backward compatibility.

Assuming there will be a 2.4 release, and then 3.0 following it, it's pretty optimistic (IMHO) to think that it will be released before June 2008, so for 3.0, that would be:

   3.0.0 - 2 years (May 2006 - May 2008)

Two years doesn't seem so long in comparison :).

> I think giving someone 4-8 (or whatever) months is more than
> enough time to prepare for API changes.   I am not sure how
> this would effect Index changes, but I do think we should
> KEEP our current index reading policy where possible.  This
> may mean that some deprecated items cannot be removed until
> a major release and I think that is fine.

Given the 6.5 month average minor release interval for the most recent major release, and the relatively low probability that this will shrink appreciably, you seem in essense to be advocating altogether abandoning backward API compatibility from one (minor) release to the next.

However, below you are advocating a minimum of one "test balloon" release between incompatible changes:

On 01/17/2008 at 3:41 PM, Grant Ingersoll wrote:
> [N]o interface/deprecation changes would be done without announcing it
> and there being at least one release in the meantime.  Thus, if we
> wanted to add isFancySchmancy() onto Fieldable today, it would have to
> be announced, patch provided and referenced, a release without it (i.e.
> 2.3) and then it would be available in 2.4.  By ad-hoc, I meant that we
> wouldn't just announce it and then have it show up in 2.3 and not give
> people time to digest it.

If I understand you correctly, a major release series could contain a whole series of non-aligned overlapping back-incompatible changes, since you are allowing individual features to alter backward incompatibility independently of other features.  I think this is actually worse than just abandoning back-compatibility, since users would have to look up information on each individual feature to be able to figure out whether they can do a drop-in upgrade.

Steve

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org