You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Erick Erickson <er...@gmail.com> on 2009/11/16 19:10:25 UTC

Why release 3.0?

One of my "specialties" is asking obvious questions just to see if
everyone's assumptions
are aligned. So with the discussion about branching 3.0 I have to ask "Is
there going to
be any 3.0 release intended for *production*?". And if not, would we save a
lot of work
by just not worrying about retrofitting fixes to a 3.0 branch and carrying
on with 3.1
as the first *supported* 3.x release?

Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not sure *as a
user* I see a
good reason to upgrade to 3.0. Getting a "beta/snapshot" release to get a
head start on
cleaning up my code does seem worthwhile, if I have the spare time. And
having a base
3.0 version that's not changing all over the place would be useful for that.

That said, I'm also not terribly comfortable with a "release" that's out
there and unsupported.

Apologies if this has already been discussed, but I don't remember it.
Although my memory
isn't what it used to be (but some would claim it never was<G>)...

Erick

Re: Why release 3.0?

Posted by Mark Miller <ma...@gmail.com>.

X.n must be able to read (X-1).n - so 3.1 will be able to read 2.9 -
major versions are also for removing deprecations.

Jake Mannix wrote:
> Yeah, sorry, I just meant that 3.0 can read 2.9 index format, but 3.1
> will not necessarily have that capability (this is the whole point of
> the difference between 2.9 and 3.0, in my understanding).
>
> On Mon, Nov 16, 2009 at 11:05 AM, Uwe Schindler <uwe@thetaphi.de
> <ma...@thetaphi.de>> wrote:
>
>     2.9 has **not** the same format as 3.0, an index created with 3.0
>     cannot be read with 2.9. This is because compressed field support
>     was removed and therefore the version number of the stored fields
>     file was upgraded. But indexes from 2.9 can be read with 3.0 and
>     support may get removed in 4.0. 3.0 Indexes can be read until
>     version 4.9.
>
>      
>
>     Uwe
>
>     -----
>     Uwe Schindler
>     H.-H.-Meier-Allee 63, D-28213 Bremen
>     http://www.thetaphi.de
>     eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>
>     ------------------------------------------------------------------------
>
>     *From:* Jake Mannix [mailto:jake.mannix@gmail.com
>     <ma...@gmail.com>]
>     *Sent:* Monday, November 16, 2009 7:15 PM
>
>     *To:* java-dev@lucene.apache.org <ma...@lucene.apache.org>
>     *Subject:* Re: Why release 3.0?
>
>      
>
>     Don't users need to upgrade to 3.0 because 3.1 won't be
>     necessarily able to read your
>     2.4 index file formats?  I suppose if you've already upgraded to
>     2.9, then all is well because
>     2.9 is the same format as 3.0, but we can't assume all users
>     upgraded from 2.4 to 2.9. 
>
>     If you've done that already, then 3.0 might not be necessary, but
>     if you're on 2.4 right now,
>     you will be in for a bad surprise if you try to upgrade to 3.1.
>
>       -jake
>
>     On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
>     <erickerickson@gmail.com <ma...@gmail.com>> wrote:
>
>     One of my "specialties" is asking obvious questions just to see if
>     everyone's assumptions 
>
>     are aligned. So with the discussion about branching 3.0 I have to
>     ask "Is there going to 
>
>     be any 3.0 release intended for *production*?". And if not, would
>     we save a lot of work
>
>     by just not worrying about retrofitting fixes to a 3.0 branch and
>     carrying on with 3.1 
>
>     as the first *supported* 3.x release?
>
>      
>
>     Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
>     sure *as a user* I see a
>
>     good reason to upgrade to 3.0. Getting a "beta/snapshot" release
>     to get a head start on
>
>     cleaning up my code does seem worthwhile, if I have the spare
>     time. And having a base
>
>     3.0 version that's not changing all over the place would be useful
>     for that.
>
>      
>
>     That said, I'm also not terribly comfortable with a "release"
>     that's out there and unsupported.
>
>      
>
>     Apologies if this has already been discussed, but I don't remember
>     it. Although my memory
>
>     isn't what it used to be (but some would claim it never was<G>)...
>
>      
>
>     Erick
>
>      
>
>      
>
>      
>
>


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Why release 3.0?

Posted by Jake Mannix <ja...@gmail.com>.

Yeah, sorry, I just meant that 3.0 can read 2.9 index format, but 3.1 will
not necessarily have that capability (this is the whole point of the
difference between 2.9 and 3.0, in my understanding).

On Mon, Nov 16, 2009 at 11:05 AM, Uwe Schindler <uw...@thetaphi.de> wrote:

>  2.9 has **not** the same format as 3.0, an index created with 3.0 cannot
> be read with 2.9. This is because compressed field support was removed and
> therefore the version number of the stored fields file was upgraded. But
> indexes from 2.9 can be read with 3.0 and support may get removed in 4.0.
> 3.0 Indexes can be read until version 4.9.
>
>
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>   ------------------------------
>
> *From:* Jake Mannix [mailto:jake.mannix@gmail.com]
> *Sent:* Monday, November 16, 2009 7:15 PM
>
> *To:* java-dev@lucene.apache.org
> *Subject:* Re: Why release 3.0?
>
>
>
> Don't users need to upgrade to 3.0 because 3.1 won't be necessarily able to
> read your
> 2.4 index file formats?  I suppose if you've already upgraded to 2.9, then
> all is well because
> 2.9 is the same format as 3.0, but we can't assume all users upgraded from
> 2.4 to 2.9.
>
> If you've done that already, then 3.0 might not be necessary, but if you're
> on 2.4 right now,
> you will be in for a bad surprise if you try to upgrade to 3.1.
>
>   -jake
>
> On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson <er...@gmail.com>
> wrote:
>
> One of my "specialties" is asking obvious questions just to see if
> everyone's assumptions
>
> are aligned. So with the discussion about branching 3.0 I have to ask "Is
> there going to
>
> be any 3.0 release intended for *production*?". And if not, would we save a
> lot of work
>
> by just not worrying about retrofitting fixes to a 3.0 branch and carrying
> on with 3.1
>
> as the first *supported* 3.x release?
>
>
>
> Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not sure *as a
> user* I see a
>
> good reason to upgrade to 3.0. Getting a "beta/snapshot" release to get a
> head start on
>
> cleaning up my code does seem worthwhile, if I have the spare time. And
> having a base
>
> 3.0 version that's not changing all over the place would be useful for
> that.
>
>
>
> That said, I'm also not terribly comfortable with a "release" that's out
> there and unsupported.
>
>
>
> Apologies if this has already been discussed, but I don't remember it.
> Although my memory
>
> isn't what it used to be (but some would claim it never was<G>)...
>
>
>
> Erick
>
>
>
>
>
>
>

Re: Why release 3.0?

Posted by Robert Muir <rc...@gmail.com>.

mark these are similar to my concerns with us doing unicode 4.0 (suppl.
characters, etc) support in 3.1.
this is why i left a comment on LUCENE-1689, I'm pretty confused about what
approach we should take, because technically, fixing this will break things.

and again, I do believe we should have fixed everything to unicode 4.0 in
for Lucene 3.0, since its the unicode version of java 5
its too late for that now, but i definitely don't want to cause problems for
3.1, right now though, it looks unavoidable.

On Mon, Nov 16, 2009 at 3:16 PM, Mark Miller <ma...@gmail.com> wrote:

> This is a big deal, weather its jdk or Lucene related. We are forcing
> those on 1.4 to move to 1.5 - any problems you face with that with the
> JDK are Lucene problems if they affect Lucene. We need big clear
> warnings about this - we should have had them before we pushed to users
> to 1.5 as well if I am reading right.
>
> If it matters what JVM runs jflex, that is also a big deal. Even if it
> hasn't been regenerated yet, it likely will be before long. We will
> break then? Perhaps its better to break now?
>
> I've only read through this thread quick, but to me, this is all a big
> deal. Think of it from a user perspective. Its not okay to just say,
> well, this stuff screws up Lucene, but its just because the user is
> switching from 1.4 to 1.5 - thats not our concern - they should know the
> consequences - I think that is our concern - very much so.
>
> Robert Muir wrote:
> > i suppose we are ok then, except for the fact that now
> > StandardTokenizer is working with a unicode 3.0 definition, instead of
> > the unicode version (4.0) that corresponds to our required minimum jre
> > (1.5)...
> >
> > sorry if i raised a stink about nothing, but you see my concerns maybe?
> >
> > On Mon, Nov 16, 2009 at 3:01 PM, Uwe Schindler <uwe@thetaphi.de
> > <ma...@thetaphi.de>> wrote:
> >
> >     JFlex was not regenerated as far as I know, but if somebody did,
> >     its already broken…
> >
> >
> >
> >     -----
> >     Uwe Schindler
> >     H.-H.-Meier-Allee 63, D-28213 Bremen
> >     http://www.thetaphi.de
> >     eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> >
> >
> ------------------------------------------------------------------------
> >
> >     *From:* Robert Muir [mailto:rcmuir@gmail.com
> >     <ma...@gmail.com>]
> >     *Sent:* Monday, November 16, 2009 8:53 PM
> >
> >     *To:* java-dev@lucene.apache.org <ma...@lucene.apache.org>
> >     *Subject:* Re: Why release 3.0?
> >
> >
> >
> >     btw, so heres a great example. you are backwards broken regardless
> >     of JVM for StandardTokenizer, because we used 1.4 JRE to run jflex
> >     in 2.9, but 1.5 in 3.0, right?
> >
> >     On Mon, Nov 16, 2009 at 2:51 PM, Robert Muir <rcmuir@gmail.com
> >     <ma...@gmail.com>> wrote:
> >
> >     Uwe, thats probably a good solution I think. just as long as we
> >     document somewhere,
> >     I think there is some warning verbage in StandardTokenizer already
> >     about this.
> >
> >     NOTE: if you change StandardTokenizerImpl.jflex and need to
> regenerate
> >           the tokenizer, remember to use JRE 1.4 to run jflex (before
> >           Lucene 3.0).  This grammar now uses constructs (eg :digit:,
> >           :letter:) whose meaning can vary according to the JRE used to
> >           run jflex.  See
> >           https://issues.apache.org/jira/browse/LUCENE-1126 for details.
> >
> >
> >
> >     On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler <uwe@thetaphi.de
> >     <ma...@thetaphi.de>> wrote:
> >
> >     But it is a general warning that should be placed in the Wiki: If
> >     you upgrade from Java 1.4 to Java 5, think about reindexing.
> >
> >
> >
> >     It has definitely nothing to do with 3.0, because uses could have
> >     changed (and most of them have) before.
> >
> >     -----
> >     Uwe Schindler
> >     H.-H.-Meier-Allee 63, D-28213 Bremen
> >     http://www.thetaphi.de
> >     eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> >
> >
> ------------------------------------------------------------------------
> >
> >     *From:* Robert Muir [mailto:rcmuir@gmail.com
> >     <ma...@gmail.com>]
> >     *Sent:* Monday, November 16, 2009 8:45 PM
> >
> >
> >     *To:* java-dev@lucene.apache.org <ma...@lucene.apache.org>
> >     *Subject:* Re: Why release 3.0?
> >
> >
> >
> >     right, my point is its true its nothing to do with Lucene at all,
> >     really.
> >
> >     but the reality is we should clarify this to users I think.
> >
> >     Its especially complex in the current StandardTokenizer, which
> >     uses a mix of hardcoded ranges and properties, can you tell me if
> >     you should reindex for given language X?
> >     I wouldn't want to answer that question right now.
> >
> >     On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <uwe@thetaphi.de
> >     <ma...@thetaphi.de>> wrote:
> >
> >     We tried out: Character.getType() for these two chars:
> >
> >
> >
> >     Java 5:
> >     '\u00AD' = 16
> >     '\u06DD' = 16
> >
> >     Java 1.4:
> >     '\u00AD' = 20
> >     '\u06DD' = 7
> >
> >
> >
> >     The first is the soft hyphen.
> >
> >     -----
> >     Uwe Schindler
> >     H.-H.-Meier-Allee 63, D-28213 Bremen
> >     http://www.thetaphi.de
> >     eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> >
> >
> ------------------------------------------------------------------------
> >
> >     *From:* Robert Muir [mailto:rcmuir@gmail.com
> >     <ma...@gmail.com>]
> >     *Sent:* Monday, November 16, 2009 8:37 PM
> >
> >
> >     *To:* java-dev@lucene.apache.org <ma...@lucene.apache.org>
> >     *Subject:* Re: Why release 3.0?
> >
> >
> >
> >     right, its nothing to do with lucene, instead due to property
> >     changes, etc.
> >
> >     i just think we should inform users on java 1.4/2.9 that if they
> >     upgrade to java 1.5/3.0, they should reindex.
> >
> >     the reason i say this about properties, is there are some that
> >     change that will affect tokenizers, i give two examples, a hyphen
> >     that changes from punctuation to format (might affect
> >     SolrWordDelimiterFilter),
> >     and arabic ayah which changes from NSM to format, which surely
> >     affects ArabicLetterTokenizer.
> >
> >     On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sarowe@syr.edu
> >     <ma...@syr.edu>> wrote:
> >
> >     Hi Robert,
> >
> >     I agree that the Unicode version supported by the JVM, as you say,
> >     really has nothing to do with Lucene.
> >
> >     The disruption here is users' upgrading from Java 1.4 to 1.5+, not
> >     when they upgrade Lucene.  I'd guess with few exceptions that most
> >     people have been using Lucene with 1.5+ for a couple of years now,
> >     though.
> >
> >     But even the upgrade from Java 1.4 to 1.5+ will have (had) zero
> >     impact on most Lucene users, assuming that most use Latin-1
> >     exclusively; although I haven't looked, I'd be surprised if
> >     Latin-1 characters changed much, if at all, from Unicode 3.0 to 4.0.
> >
> >     It would be useful, I think, to include (a pointer to?) a
> >     description of the details of the Unicode 3.0->4.0 differences in
> >     the Lucene 3.0 release notes, since the minimum required Java
> >     version, and so also the supported Unicode version, changes then.
> >
> >     Steve
> >
> >
> >     On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> >     > the problem is that the properties have changed for various
> >     characters,
> >     > and new characters were added.
> >     >
> >     > it really has nothing to do with lucene, but the idea you can go
> from
> >     > jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not
> >     true.
> >     >
> >     >
> >     > On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uwe@thetaphi.de
> >     <ma...@thetaphi.de>> wrote:
> >     >
> >     >
> >     >       But an UTF-8 stream from Java 4 can still be read with Java
> 5,
> >     > what is the problem? Java 5 extended Unicode support, but an index
> >     > created with older versions can still be read. UTF-8 is
> standardized…
> >     >
> >     >
> >     >
> >     >       -----
> >     >       Uwe Schindler
> >     >       H.-H.-Meier-Allee 63, D-28213 Bremen
> >     >       http://www.thetaphi.de
> >     >       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> >     >
> >     >
> >     > ________________________________
> >     >
> >     >
> >     >       From: Robert Muir [mailto:rcmuir@gmail.com
> >     <ma...@gmail.com>]
> >     >       Sent: Monday, November 16, 2009 8:09 PM
> >     >
> >     >       To: java-dev@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >       Subject: Re: Why release 3.0?
> >     >
> >     >
> >     >
> >     >       uwe, on topic please read my comment on LUCENE-1689, because
> >     > unicode version was bumped in jdk 1.5, i believe this index
> backwards
> >     > compatibility is only theoretical
> >     >
> >     >       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler
> >     <uwe@thetaphi.de <ma...@thetaphi.de>> wrote:
> >     >
> >     >       2.9 has *not* the same format as 3.0, an index created with
> 3.0
> >     > cannot be read with 2.9. This is because compressed field support
> was
> >     > removed and therefore the version number of the stored fields
> >     file was
> >     > upgraded. But indexes from 2.9 can be read with 3.0 and support
> >     may get
> >     > removed in 4.0. 3.0 Indexes can be read until version 4.9.
> >     >
> >     >
> >     >
> >     >       Uwe
> >     >
> >     >       -----
> >     >       Uwe Schindler
> >     >       H.-H.-Meier-Allee 63, D-28213 Bremen
> >     >       http://www.thetaphi.de
> >     >       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> >     >
> >     >
> >     > ________________________________
> >     >
> >     >
> >     >       From: Jake Mannix [mailto:jake.mannix@gmail.com
> >     <ma...@gmail.com>]
> >     >       Sent: Monday, November 16, 2009 7:15 PM
> >     >
> >     >
> >     >       To: java-dev@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     >
> >     >       Subject: Re: Why release 3.0?
> >     >
> >     >
> >     >
> >     >       Don't users need to upgrade to 3.0 because 3.1 won't be
> >     > necessarily able to read your
> >     >       2.4 index file formats?  I suppose if you've already
> >     upgraded to
> >     > 2.9, then all is well because
> >     >       2.9 is the same format as 3.0, but we can't assume all users
> >     > upgraded from 2.4 to 2.9.
> >     >
> >     >       If you've done that already, then 3.0 might not be necessary,
> >     > but if you're on 2.4 right now,
> >     >       you will be in for a bad surprise if you try to upgrade to
> 3.1.
> >     >
> >     >         -jake
> >     >
> >     >       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> >     > <erickerickson@gmail.com <ma...@gmail.com>> wrote:
> >     >
> >     >       One of my "specialties" is asking obvious questions just to
> see
> >     > if everyone's assumptions are aligned. So with the discussion about
> >     > branching 3.0 I have to ask "Is there going to be any 3.0 release
> >     > intended for *production*?". And if not, would we save a lot of
> >     > work by just not worrying about retrofitting fixes to a 3.0 branch
> >     > and carrying on with 3.1 as the first *supported* 3.x release?
> >     >
> >     >       Since 3.0 is "upgrade-to-java5 and remove deprecations",
> >     I'm not
> >     > sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> >     > "beta/snapshot" release to get a head start on cleaning up my code
> >     > does seem worthwhile, if I have the spare time. And having a base
> >     > 3.0 version that's not changing all over the place would be useful
> >     > for that.
> >     >
> >     >       That said, I'm also not terribly comfortable with a "release"
> >     > that's out there and unsupported.
> >     >
> >     >       Apologies if this has already been discussed, but I don't
> >     > remember it. Although my memory isn't what it used to be (but
> >     > some would claim it never was<G>)...
> >     >
> >     >       Erick
> >
> >
> >
> >
> >     --
> >     Robert Muir
> >     rcmuir@gmail.com <ma...@gmail.com>
> >
> >
> >
> >
> >     --
> >     Robert Muir
> >     rcmuir@gmail.com <ma...@gmail.com>
> >
> >
> >
> >
> >     --
> >     Robert Muir
> >     rcmuir@gmail.com <ma...@gmail.com>
> >
> >
> >
> >
> >     --
> >     Robert Muir
> >     rcmuir@gmail.com <ma...@gmail.com>
> >
> >
> >
> >
> > --
> > Robert Muir
> > rcmuir@gmail.com <ma...@gmail.com>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Robert Muir
rcmuir@gmail.com

Re: Why release 3.0?

Posted by Mark Miller <ma...@gmail.com>.

This is a big deal, weather its jdk or Lucene related. We are forcing
those on 1.4 to move to 1.5 - any problems you face with that with the
JDK are Lucene problems if they affect Lucene. We need big clear
warnings about this - we should have had them before we pushed to users
to 1.5 as well if I am reading right.

If it matters what JVM runs jflex, that is also a big deal. Even if it
hasn't been regenerated yet, it likely will be before long. We will
break then? Perhaps its better to break now?

I've only read through this thread quick, but to me, this is all a big
deal. Think of it from a user perspective. Its not okay to just say,
well, this stuff screws up Lucene, but its just because the user is
switching from 1.4 to 1.5 - thats not our concern - they should know the
consequences - I think that is our concern - very much so.

Robert Muir wrote:
> i suppose we are ok then, except for the fact that now
> StandardTokenizer is working with a unicode 3.0 definition, instead of
> the unicode version (4.0) that corresponds to our required minimum jre
> (1.5)...
>
> sorry if i raised a stink about nothing, but you see my concerns maybe?
>
> On Mon, Nov 16, 2009 at 3:01 PM, Uwe Schindler <uwe@thetaphi.de
> <ma...@thetaphi.de>> wrote:
>
>     JFlex was not regenerated as far as I know, but if somebody did,
>     its already broken…
>
>      
>
>     -----
>     Uwe Schindler
>     H.-H.-Meier-Allee 63, D-28213 Bremen
>     http://www.thetaphi.de
>     eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>
>     ------------------------------------------------------------------------
>
>     *From:* Robert Muir [mailto:rcmuir@gmail.com
>     <ma...@gmail.com>]
>     *Sent:* Monday, November 16, 2009 8:53 PM
>
>     *To:* java-dev@lucene.apache.org <ma...@lucene.apache.org>
>     *Subject:* Re: Why release 3.0?
>
>      
>
>     btw, so heres a great example. you are backwards broken regardless
>     of JVM for StandardTokenizer, because we used 1.4 JRE to run jflex
>     in 2.9, but 1.5 in 3.0, right?
>
>     On Mon, Nov 16, 2009 at 2:51 PM, Robert Muir <rcmuir@gmail.com
>     <ma...@gmail.com>> wrote:
>
>     Uwe, thats probably a good solution I think. just as long as we
>     document somewhere,
>     I think there is some warning verbage in StandardTokenizer already
>     about this.
>
>     NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate
>           the tokenizer, remember to use JRE 1.4 to run jflex (before
>           Lucene 3.0).  This grammar now uses constructs (eg :digit:,
>           :letter:) whose meaning can vary according to the JRE used to
>           run jflex.  See
>           https://issues.apache.org/jira/browse/LUCENE-1126 for details.
>
>      
>
>     On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler <uwe@thetaphi.de
>     <ma...@thetaphi.de>> wrote:
>
>     But it is a general warning that should be placed in the Wiki: If
>     you upgrade from Java 1.4 to Java 5, think about reindexing.
>
>      
>
>     It has definitely nothing to do with 3.0, because uses could have
>     changed (and most of them have) before.
>
>     -----
>     Uwe Schindler
>     H.-H.-Meier-Allee 63, D-28213 Bremen
>     http://www.thetaphi.de
>     eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>
>     ------------------------------------------------------------------------
>
>     *From:* Robert Muir [mailto:rcmuir@gmail.com
>     <ma...@gmail.com>]
>     *Sent:* Monday, November 16, 2009 8:45 PM
>
>
>     *To:* java-dev@lucene.apache.org <ma...@lucene.apache.org>
>     *Subject:* Re: Why release 3.0?
>
>      
>
>     right, my point is its true its nothing to do with Lucene at all,
>     really.
>
>     but the reality is we should clarify this to users I think.
>
>     Its especially complex in the current StandardTokenizer, which
>     uses a mix of hardcoded ranges and properties, can you tell me if
>     you should reindex for given language X?
>     I wouldn't want to answer that question right now.
>
>     On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <uwe@thetaphi.de
>     <ma...@thetaphi.de>> wrote:
>
>     We tried out: Character.getType() for these two chars:
>
>      
>
>     Java 5:
>     '\u00AD' = 16
>     '\u06DD' = 16
>
>     Java 1.4:
>     '\u00AD' = 20
>     '\u06DD' = 7
>
>      
>
>     The first is the soft hyphen.
>
>     -----
>     Uwe Schindler
>     H.-H.-Meier-Allee 63, D-28213 Bremen
>     http://www.thetaphi.de
>     eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>
>     ------------------------------------------------------------------------
>
>     *From:* Robert Muir [mailto:rcmuir@gmail.com
>     <ma...@gmail.com>]
>     *Sent:* Monday, November 16, 2009 8:37 PM
>
>
>     *To:* java-dev@lucene.apache.org <ma...@lucene.apache.org>
>     *Subject:* Re: Why release 3.0?
>
>      
>
>     right, its nothing to do with lucene, instead due to property
>     changes, etc.
>
>     i just think we should inform users on java 1.4/2.9 that if they
>     upgrade to java 1.5/3.0, they should reindex.
>
>     the reason i say this about properties, is there are some that
>     change that will affect tokenizers, i give two examples, a hyphen
>     that changes from punctuation to format (might affect
>     SolrWordDelimiterFilter),
>     and arabic ayah which changes from NSM to format, which surely
>     affects ArabicLetterTokenizer.
>
>     On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sarowe@syr.edu
>     <ma...@syr.edu>> wrote:
>
>     Hi Robert,
>
>     I agree that the Unicode version supported by the JVM, as you say,
>     really has nothing to do with Lucene.
>
>     The disruption here is users' upgrading from Java 1.4 to 1.5+, not
>     when they upgrade Lucene.  I'd guess with few exceptions that most
>     people have been using Lucene with 1.5+ for a couple of years now,
>     though.
>
>     But even the upgrade from Java 1.4 to 1.5+ will have (had) zero
>     impact on most Lucene users, assuming that most use Latin-1
>     exclusively; although I haven't looked, I'd be surprised if
>     Latin-1 characters changed much, if at all, from Unicode 3.0 to 4.0.
>
>     It would be useful, I think, to include (a pointer to?) a
>     description of the details of the Unicode 3.0->4.0 differences in
>     the Lucene 3.0 release notes, since the minimum required Java
>     version, and so also the supported Unicode version, changes then.
>
>     Steve
>
>
>     On 11/16/2009 at 2:15 PM, Robert Muir wrote:
>     > the problem is that the properties have changed for various
>     characters,
>     > and new characters were added.
>     >
>     > it really has nothing to do with lucene, but the idea you can go from
>     > jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not
>     true.
>     >
>     >
>     > On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uwe@thetaphi.de
>     <ma...@thetaphi.de>> wrote:
>     >
>     >
>     >       But an UTF-8 stream from Java 4 can still be read with Java 5,
>     > what is the problem? Java 5 extended Unicode support, but an index
>     > created with older versions can still be read. UTF-8 is standardized…
>     >
>     >
>     >
>     >       -----
>     >       Uwe Schindler
>     >       H.-H.-Meier-Allee 63, D-28213 Bremen
>     >       http://www.thetaphi.de
>     >       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>     >
>     >
>     > ________________________________
>     >
>     >
>     >       From: Robert Muir [mailto:rcmuir@gmail.com
>     <ma...@gmail.com>]
>     >       Sent: Monday, November 16, 2009 8:09 PM
>     >
>     >       To: java-dev@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >       Subject: Re: Why release 3.0?
>     >
>     >
>     >
>     >       uwe, on topic please read my comment on LUCENE-1689, because
>     > unicode version was bumped in jdk 1.5, i believe this index backwards
>     > compatibility is only theoretical
>     >
>     >       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler
>     <uwe@thetaphi.de <ma...@thetaphi.de>> wrote:
>     >
>     >       2.9 has *not* the same format as 3.0, an index created with 3.0
>     > cannot be read with 2.9. This is because compressed field support was
>     > removed and therefore the version number of the stored fields
>     file was
>     > upgraded. But indexes from 2.9 can be read with 3.0 and support
>     may get
>     > removed in 4.0. 3.0 Indexes can be read until version 4.9.
>     >
>     >
>     >
>     >       Uwe
>     >
>     >       -----
>     >       Uwe Schindler
>     >       H.-H.-Meier-Allee 63, D-28213 Bremen
>     >       http://www.thetaphi.de
>     >       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>     >
>     >
>     > ________________________________
>     >
>     >
>     >       From: Jake Mannix [mailto:jake.mannix@gmail.com
>     <ma...@gmail.com>]
>     >       Sent: Monday, November 16, 2009 7:15 PM
>     >
>     >
>     >       To: java-dev@lucene.apache.org
>     <ma...@lucene.apache.org>
>     >
>     >       Subject: Re: Why release 3.0?
>     >
>     >
>     >
>     >       Don't users need to upgrade to 3.0 because 3.1 won't be
>     > necessarily able to read your
>     >       2.4 index file formats?  I suppose if you've already
>     upgraded to
>     > 2.9, then all is well because
>     >       2.9 is the same format as 3.0, but we can't assume all users
>     > upgraded from 2.4 to 2.9.
>     >
>     >       If you've done that already, then 3.0 might not be necessary,
>     > but if you're on 2.4 right now,
>     >       you will be in for a bad surprise if you try to upgrade to 3.1.
>     >
>     >         -jake
>     >
>     >       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
>     > <erickerickson@gmail.com <ma...@gmail.com>> wrote:
>     >
>     >       One of my "specialties" is asking obvious questions just to see
>     > if everyone's assumptions are aligned. So with the discussion about
>     > branching 3.0 I have to ask "Is there going to be any 3.0 release
>     > intended for *production*?". And if not, would we save a lot of
>     > work by just not worrying about retrofitting fixes to a 3.0 branch
>     > and carrying on with 3.1 as the first *supported* 3.x release?
>     >
>     >       Since 3.0 is "upgrade-to-java5 and remove deprecations",
>     I'm not
>     > sure *as a user* I see a good reason to upgrade to 3.0. Getting a
>     > "beta/snapshot" release to get a head start on cleaning up my code
>     > does seem worthwhile, if I have the spare time. And having a base
>     > 3.0 version that's not changing all over the place would be useful
>     > for that.
>     >
>     >       That said, I'm also not terribly comfortable with a "release"
>     > that's out there and unsupported.
>     >
>     >       Apologies if this has already been discussed, but I don't
>     > remember it. Although my memory isn't what it used to be (but
>     > some would claim it never was<G>)...
>     >
>     >       Erick
>
>
>
>
>     -- 
>     Robert Muir
>     rcmuir@gmail.com <ma...@gmail.com>
>
>
>
>
>     -- 
>     Robert Muir
>     rcmuir@gmail.com <ma...@gmail.com>
>
>
>
>
>     -- 
>     Robert Muir
>     rcmuir@gmail.com <ma...@gmail.com>
>
>
>
>
>     -- 
>     Robert Muir
>     rcmuir@gmail.com <ma...@gmail.com>
>
>
>
>
> -- 
> Robert Muir
> rcmuir@gmail.com <ma...@gmail.com>


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

RE: Why release 3.0?

Posted by Uwe Schindler <uw...@thetaphi.de>.

I have to regenerate the JFlex files to be sure that they are Java 5. Should
I do and recreate the artifacts, they are not yet released.

Correct would be to copy the current generated Java file and use it if
matchVersion < Version.LUCENE_30. For 3.0++ we have a new one. If the old
one is really Java 1.4 can be seen by trying out.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Robert Muir [mailto:rcmuir@gmail.com] 
Sent: Monday, November 16, 2009 9:06 PM
To: java-dev@lucene.apache.org
Subject: Re: Why release 3.0?

i suppose we are ok then, except for the fact that now StandardTokenizer is
working with a unicode 3.0 definition, instead of the unicode version (4.0)
that corresponds to our required minimum jre (1.5)...

sorry if i raised a stink about nothing, but you see my concerns maybe?

On Mon, Nov 16, 2009 at 3:01 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

JFlex was not regenerated as far as I know, but if somebody did, its already
broken.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Robert Muir [mailto:rcmuir@gmail.com] 
Sent: Monday, November 16, 2009 8:53 PM

To: java-dev@lucene.apache.org
Subject: Re: Why release 3.0?

btw, so heres a great example. you are backwards broken regardless of JVM
for StandardTokenizer, because we used 1.4 JRE to run jflex in 2.9, but 1.5
in 3.0, right?

On Mon, Nov 16, 2009 at 2:51 PM, Robert Muir <rc...@gmail.com> wrote:

Uwe, thats probably a good solution I think. just as long as we document
somewhere,
I think there is some warning verbage in StandardTokenizer already about
this.

NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate
      the tokenizer, remember to use JRE 1.4 to run jflex (before
      Lucene 3.0).  This grammar now uses constructs (eg :digit:,
      :letter:) whose meaning can vary according to the JRE used to
      run jflex.  See
      https://issues.apache.org/jira/browse/LUCENE-1126 for details.

On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

But it is a general warning that should be placed in the Wiki: If you
upgrade from Java 1.4 to Java 5, think about reindexing.

It has definitely nothing to do with 3.0, because uses could have changed
(and most of them have) before.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Robert Muir [mailto:rcmuir@gmail.com] 
Sent: Monday, November 16, 2009 8:45 PM

To: java-dev@lucene.apache.org
Subject: Re: Why release 3.0?

right, my point is its true its nothing to do with Lucene at all, really.

but the reality is we should clarify this to users I think. 

Its especially complex in the current StandardTokenizer, which uses a mix of
hardcoded ranges and properties, can you tell me if you should reindex for
given language X?
I wouldn't want to answer that question right now.

On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

We tried out: Character.getType() for these two chars:

Java 5:
'\u00AD' = 16
'\u06DD' = 16

Java 1.4:
'\u00AD' = 20
'\u06DD' = 7

The first is the soft hyphen.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Robert Muir [mailto:rcmuir@gmail.com] 
Sent: Monday, November 16, 2009 8:37 PM

To: java-dev@lucene.apache.org
Subject: Re: Why release 3.0?

right, its nothing to do with lucene, instead due to property changes, etc.

i just think we should inform users on java 1.4/2.9 that if they upgrade to
java 1.5/3.0, they should reindex.

the reason i say this about properties, is there are some that change that
will affect tokenizers, i give two examples, a hyphen that changes from
punctuation to format (might affect SolrWordDelimiterFilter),
and arabic ayah which changes from NSM to format, which surely affects
ArabicLetterTokenizer.

On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sa...@syr.edu> wrote:

Hi Robert,

I agree that the Unicode version supported by the JVM, as you say, really
has nothing to do with Lucene.

The disruption here is users' upgrading from Java 1.4 to 1.5+, not when they
upgrade Lucene.  I'd guess with few exceptions that most people have been
using Lucene with 1.5+ for a couple of years now, though.

But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on
most Lucene users, assuming that most use Latin-1 exclusively; although I
haven't looked, I'd be surprised if Latin-1 characters changed much, if at
all, from Unicode 3.0 to 4.0.

It would be useful, I think, to include (a pointer to?) a description of the
details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release notes,
since the minimum required Java version, and so also the supported Unicode
version, changes then.

Steve

On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> the problem is that the properties have changed for various characters,
> and new characters were added.
>
> it really has nothing to do with lucene, but the idea you can go from
> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
>
>
> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
>
>       But an UTF-8 stream from Java 4 can still be read with Java 5,
> what is the problem? Java 5 extended Unicode support, but an index
> created with older versions can still be read. UTF-8 is standardized.
>
>
>
>       -----
>       Uwe Schindler
>       H.-H.-Meier-Allee 63, D-28213 Bremen
>       http://www.thetaphi.de
>       eMail: uwe@thetaphi.de
>
>
> ________________________________
>
>
>       From: Robert Muir [mailto:rcmuir@gmail.com]
>       Sent: Monday, November 16, 2009 8:09 PM
>
>       To: java-dev@lucene.apache.org
>       Subject: Re: Why release 3.0?
>
>
>
>       uwe, on topic please read my comment on LUCENE-1689, because
> unicode version was bumped in jdk 1.5, i believe this index backwards
> compatibility is only theoretical
>
>       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uw...@thetaphi.de>
wrote:
>
>       2.9 has *not* the same format as 3.0, an index created with 3.0
> cannot be read with 2.9. This is because compressed field support was
> removed and therefore the version number of the stored fields file was
> upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> removed in 4.0. 3.0 Indexes can be read until version 4.9.
>
>
>
>       Uwe
>
>       -----
>       Uwe Schindler
>       H.-H.-Meier-Allee 63, D-28213 Bremen
>       http://www.thetaphi.de
>       eMail: uwe@thetaphi.de
>
>
> ________________________________
>
>
>       From: Jake Mannix [mailto:jake.mannix@gmail.com]
>       Sent: Monday, November 16, 2009 7:15 PM
>
>
>       To: java-dev@lucene.apache.org
>
>       Subject: Re: Why release 3.0?
>
>
>
>       Don't users need to upgrade to 3.0 because 3.1 won't be
> necessarily able to read your
>       2.4 index file formats?  I suppose if you've already upgraded to
> 2.9, then all is well because
>       2.9 is the same format as 3.0, but we can't assume all users
> upgraded from 2.4 to 2.9.
>
>       If you've done that already, then 3.0 might not be necessary,
> but if you're on 2.4 right now,
>       you will be in for a bad surprise if you try to upgrade to 3.1.
>
>         -jake
>
>       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> <er...@gmail.com> wrote:
>
>       One of my "specialties" is asking obvious questions just to see
> if everyone's assumptions are aligned. So with the discussion about
> branching 3.0 I have to ask "Is there going to be any 3.0 release
> intended for *production*?". And if not, would we save a lot of
> work by just not worrying about retrofitting fixes to a 3.0 branch
> and carrying on with 3.1 as the first *supported* 3.x release?
>
>       Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> "beta/snapshot" release to get a head start on cleaning up my code
> does seem worthwhile, if I have the spare time. And having a base
> 3.0 version that's not changing all over the place would be useful
> for that.
>
>       That said, I'm also not terribly comfortable with a "release"
> that's out there and unsupported.
>
>       Apologies if this has already been discussed, but I don't
> remember it. Although my memory isn't what it used to be (but
> some would claim it never was<G>)...
>
>       Erick

-- 
Robert Muir
rcmuir@gmail.com

-- 
Robert Muir
rcmuir@gmail.com

-- 
Robert Muir
rcmuir@gmail.com

-- 
Robert Muir
rcmuir@gmail.com

-- 
Robert Muir
rcmuir@gmail.com

Re: Why release 3.0?

Posted by Robert Muir <rc...@gmail.com>.

i suppose we are ok then, except for the fact that now StandardTokenizer is
working with a unicode 3.0 definition, instead of the unicode version (4.0)
that corresponds to our required minimum jre (1.5)...

sorry if i raised a stink about nothing, but you see my concerns maybe?

On Mon, Nov 16, 2009 at 3:01 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

>  JFlex was not regenerated as far as I know, but if somebody did, its
> already broken…
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>   ------------------------------
>
> *From:* Robert Muir [mailto:rcmuir@gmail.com]
> *Sent:* Monday, November 16, 2009 8:53 PM
>
> *To:* java-dev@lucene.apache.org
> *Subject:* Re: Why release 3.0?
>
>
>
> btw, so heres a great example. you are backwards broken regardless of JVM
> for StandardTokenizer, because we used 1.4 JRE to run jflex in 2.9, but 1.5
> in 3.0, right?
>
> On Mon, Nov 16, 2009 at 2:51 PM, Robert Muir <rc...@gmail.com> wrote:
>
> Uwe, thats probably a good solution I think. just as long as we document
> somewhere,
> I think there is some warning verbage in StandardTokenizer already about
> this.
>
> NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate
>       the tokenizer, remember to use JRE 1.4 to run jflex (before
>       Lucene 3.0).  This grammar now uses constructs (eg :digit:,
>       :letter:) whose meaning can vary according to the JRE used to
>       run jflex.  See
>       https://issues.apache.org/jira/browse/LUCENE-1126 for details.
>
>
>
> On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
> But it is a general warning that should be placed in the Wiki: If you
> upgrade from Java 1.4 to Java 5, think about reindexing.
>
>
>
> It has definitely nothing to do with 3.0, because uses could have changed
> (and most of them have) before.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>    ------------------------------
>
> *From:* Robert Muir [mailto:rcmuir@gmail.com]
> *Sent:* Monday, November 16, 2009 8:45 PM
>
>
> *To:* java-dev@lucene.apache.org
> *Subject:* Re: Why release 3.0?
>
>
>
> right, my point is its true its nothing to do with Lucene at all, really.
>
> but the reality is we should clarify this to users I think.
>
> Its especially complex in the current StandardTokenizer, which uses a mix
> of hardcoded ranges and properties, can you tell me if you should reindex
> for given language X?
> I wouldn't want to answer that question right now.
>
> On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
> We tried out: Character.getType() for these two chars:
>
>
>
> Java 5:
> '\u00AD' = 16
> '\u06DD' = 16
>
> Java 1.4:
> '\u00AD' = 20
> '\u06DD' = 7
>
>
>
> The first is the soft hyphen.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>    ------------------------------
>
> *From:* Robert Muir [mailto:rcmuir@gmail.com]
> *Sent:* Monday, November 16, 2009 8:37 PM
>
>
> *To:* java-dev@lucene.apache.org
> *Subject:* Re: Why release 3.0?
>
>
>
> right, its nothing to do with lucene, instead due to property changes, etc.
>
> i just think we should inform users on java 1.4/2.9 that if they upgrade to
> java 1.5/3.0, they should reindex.
>
> the reason i say this about properties, is there are some that change that
> will affect tokenizers, i give two examples, a hyphen that changes from
> punctuation to format (might affect SolrWordDelimiterFilter),
> and arabic ayah which changes from NSM to format, which surely affects
> ArabicLetterTokenizer.
>
> On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sa...@syr.edu> wrote:
>
> Hi Robert,
>
> I agree that the Unicode version supported by the JVM, as you say, really
> has nothing to do with Lucene.
>
> The disruption here is users' upgrading from Java 1.4 to 1.5+, not when
> they upgrade Lucene.  I'd guess with few exceptions that most people have
> been using Lucene with 1.5+ for a couple of years now, though.
>
> But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on
> most Lucene users, assuming that most use Latin-1 exclusively; although I
> haven't looked, I'd be surprised if Latin-1 characters changed much, if at
> all, from Unicode 3.0 to 4.0.
>
> It would be useful, I think, to include (a pointer to?) a description of
> the details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release
> notes, since the minimum required Java version, and so also the supported
> Unicode version, changes then.
>
> Steve
>
>
> On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> > the problem is that the properties have changed for various characters,
> > and new characters were added.
> >
> > it really has nothing to do with lucene, but the idea you can go from
> > jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
> >
> >
> > On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> >
> >
> >       But an UTF-8 stream from Java 4 can still be read with Java 5,
> > what is the problem? Java 5 extended Unicode support, but an index
> > created with older versions can still be read. UTF-8 is standardized…
> >
> >
> >
> >       -----
> >       Uwe Schindler
> >       H.-H.-Meier-Allee 63, D-28213 Bremen
> >       http://www.thetaphi.de
> >       eMail: uwe@thetaphi.de
> >
> >
> > ________________________________
> >
> >
> >       From: Robert Muir [mailto:rcmuir@gmail.com]
> >       Sent: Monday, November 16, 2009 8:09 PM
> >
> >       To: java-dev@lucene.apache.org
> >       Subject: Re: Why release 3.0?
> >
> >
> >
> >       uwe, on topic please read my comment on LUCENE-1689, because
> > unicode version was bumped in jdk 1.5, i believe this index backwards
> > compatibility is only theoretical
> >
> >       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uw...@thetaphi.de>
> wrote:
> >
> >       2.9 has *not* the same format as 3.0, an index created with 3.0
> > cannot be read with 2.9. This is because compressed field support was
> > removed and therefore the version number of the stored fields file was
> > upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> > removed in 4.0. 3.0 Indexes can be read until version 4.9.
> >
> >
> >
> >       Uwe
> >
> >       -----
> >       Uwe Schindler
> >       H.-H.-Meier-Allee 63, D-28213 Bremen
> >       http://www.thetaphi.de
> >       eMail: uwe@thetaphi.de
> >
> >
> > ________________________________
> >
> >
> >       From: Jake Mannix [mailto:jake.mannix@gmail.com]
> >       Sent: Monday, November 16, 2009 7:15 PM
> >
> >
> >       To: java-dev@lucene.apache.org
> >
> >       Subject: Re: Why release 3.0?
> >
> >
> >
> >       Don't users need to upgrade to 3.0 because 3.1 won't be
> > necessarily able to read your
> >       2.4 index file formats?  I suppose if you've already upgraded to
> > 2.9, then all is well because
> >       2.9 is the same format as 3.0, but we can't assume all users
> > upgraded from 2.4 to 2.9.
> >
> >       If you've done that already, then 3.0 might not be necessary,
> > but if you're on 2.4 right now,
> >       you will be in for a bad surprise if you try to upgrade to 3.1.
> >
> >         -jake
> >
> >       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> > <er...@gmail.com> wrote:
> >
> >       One of my "specialties" is asking obvious questions just to see
> > if everyone's assumptions are aligned. So with the discussion about
> > branching 3.0 I have to ask "Is there going to be any 3.0 release
> > intended for *production*?". And if not, would we save a lot of
> > work by just not worrying about retrofitting fixes to a 3.0 branch
> > and carrying on with 3.1 as the first *supported* 3.x release?
> >
> >       Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> > sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> > "beta/snapshot" release to get a head start on cleaning up my code
> > does seem worthwhile, if I have the spare time. And having a base
> > 3.0 version that's not changing all over the place would be useful
> > for that.
> >
> >       That said, I'm also not terribly comfortable with a "release"
> > that's out there and unsupported.
> >
> >       Apologies if this has already been discussed, but I don't
> > remember it. Although my memory isn't what it used to be (but
> > some would claim it never was<G>)...
> >
> >       Erick
>
>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>
>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>
>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>
>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>



-- 
Robert Muir
rcmuir@gmail.com

RE: Why release 3.0?

Posted by Uwe Schindler <uw...@thetaphi.de>.

JFlex was not regenerated as far as I know, but if somebody did, its already
broken.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Robert Muir [mailto:rcmuir@gmail.com] 
Sent: Monday, November 16, 2009 8:53 PM
To: java-dev@lucene.apache.org
Subject: Re: Why release 3.0?

btw, so heres a great example. you are backwards broken regardless of JVM
for StandardTokenizer, because we used 1.4 JRE to run jflex in 2.9, but 1.5
in 3.0, right?

On Mon, Nov 16, 2009 at 2:51 PM, Robert Muir <rc...@gmail.com> wrote:

Uwe, thats probably a good solution I think. just as long as we document
somewhere,
I think there is some warning verbage in StandardTokenizer already about
this.

NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate
      the tokenizer, remember to use JRE 1.4 to run jflex (before
      Lucene 3.0).  This grammar now uses constructs (eg :digit:,
      :letter:) whose meaning can vary according to the JRE used to
      run jflex.  See
      https://issues.apache.org/jira/browse/LUCENE-1126 for details.

On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

But it is a general warning that should be placed in the Wiki: If you
upgrade from Java 1.4 to Java 5, think about reindexing.

It has definitely nothing to do with 3.0, because uses could have changed
(and most of them have) before.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Robert Muir [mailto:rcmuir@gmail.com] 
Sent: Monday, November 16, 2009 8:45 PM

To: java-dev@lucene.apache.org
Subject: Re: Why release 3.0?

right, my point is its true its nothing to do with Lucene at all, really.

but the reality is we should clarify this to users I think. 

Its especially complex in the current StandardTokenizer, which uses a mix of
hardcoded ranges and properties, can you tell me if you should reindex for
given language X?
I wouldn't want to answer that question right now.

On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

We tried out: Character.getType() for these two chars:

Java 5:
'\u00AD' = 16
'\u06DD' = 16

Java 1.4:
'\u00AD' = 20
'\u06DD' = 7

The first is the soft hyphen.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Robert Muir [mailto:rcmuir@gmail.com] 
Sent: Monday, November 16, 2009 8:37 PM

To: java-dev@lucene.apache.org
Subject: Re: Why release 3.0?

right, its nothing to do with lucene, instead due to property changes, etc.

i just think we should inform users on java 1.4/2.9 that if they upgrade to
java 1.5/3.0, they should reindex.

the reason i say this about properties, is there are some that change that
will affect tokenizers, i give two examples, a hyphen that changes from
punctuation to format (might affect SolrWordDelimiterFilter),
and arabic ayah which changes from NSM to format, which surely affects
ArabicLetterTokenizer.

On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sa...@syr.edu> wrote:

Hi Robert,

I agree that the Unicode version supported by the JVM, as you say, really
has nothing to do with Lucene.

The disruption here is users' upgrading from Java 1.4 to 1.5+, not when they
upgrade Lucene.  I'd guess with few exceptions that most people have been
using Lucene with 1.5+ for a couple of years now, though.

But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on
most Lucene users, assuming that most use Latin-1 exclusively; although I
haven't looked, I'd be surprised if Latin-1 characters changed much, if at
all, from Unicode 3.0 to 4.0.

It would be useful, I think, to include (a pointer to?) a description of the
details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release notes,
since the minimum required Java version, and so also the supported Unicode
version, changes then.

Steve

On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> the problem is that the properties have changed for various characters,
> and new characters were added.
>
> it really has nothing to do with lucene, but the idea you can go from
> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
>
>
> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
>
>       But an UTF-8 stream from Java 4 can still be read with Java 5,
> what is the problem? Java 5 extended Unicode support, but an index
> created with older versions can still be read. UTF-8 is standardized.
>
>
>
>       -----
>       Uwe Schindler
>       H.-H.-Meier-Allee 63, D-28213 Bremen
>       http://www.thetaphi.de
>       eMail: uwe@thetaphi.de
>
>
> ________________________________
>
>
>       From: Robert Muir [mailto:rcmuir@gmail.com]
>       Sent: Monday, November 16, 2009 8:09 PM
>
>       To: java-dev@lucene.apache.org
>       Subject: Re: Why release 3.0?
>
>
>
>       uwe, on topic please read my comment on LUCENE-1689, because
> unicode version was bumped in jdk 1.5, i believe this index backwards
> compatibility is only theoretical
>
>       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uw...@thetaphi.de>
wrote:
>
>       2.9 has *not* the same format as 3.0, an index created with 3.0
> cannot be read with 2.9. This is because compressed field support was
> removed and therefore the version number of the stored fields file was
> upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> removed in 4.0. 3.0 Indexes can be read until version 4.9.
>
>
>
>       Uwe
>
>       -----
>       Uwe Schindler
>       H.-H.-Meier-Allee 63, D-28213 Bremen
>       http://www.thetaphi.de
>       eMail: uwe@thetaphi.de
>
>
> ________________________________
>
>
>       From: Jake Mannix [mailto:jake.mannix@gmail.com]
>       Sent: Monday, November 16, 2009 7:15 PM
>
>
>       To: java-dev@lucene.apache.org
>
>       Subject: Re: Why release 3.0?
>
>
>
>       Don't users need to upgrade to 3.0 because 3.1 won't be
> necessarily able to read your
>       2.4 index file formats?  I suppose if you've already upgraded to
> 2.9, then all is well because
>       2.9 is the same format as 3.0, but we can't assume all users
> upgraded from 2.4 to 2.9.
>
>       If you've done that already, then 3.0 might not be necessary,
> but if you're on 2.4 right now,
>       you will be in for a bad surprise if you try to upgrade to 3.1.
>
>         -jake
>
>       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> <er...@gmail.com> wrote:
>
>       One of my "specialties" is asking obvious questions just to see
> if everyone's assumptions are aligned. So with the discussion about
> branching 3.0 I have to ask "Is there going to be any 3.0 release
> intended for *production*?". And if not, would we save a lot of
> work by just not worrying about retrofitting fixes to a 3.0 branch
> and carrying on with 3.1 as the first *supported* 3.x release?
>
>       Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> "beta/snapshot" release to get a head start on cleaning up my code
> does seem worthwhile, if I have the spare time. And having a base
> 3.0 version that's not changing all over the place would be useful
> for that.
>
>       That said, I'm also not terribly comfortable with a "release"
> that's out there and unsupported.
>
>       Apologies if this has already been discussed, but I don't
> remember it. Although my memory isn't what it used to be (but
> some would claim it never was<G>)...
>
>       Erick

-- 
Robert Muir
rcmuir@gmail.com

-- 
Robert Muir
rcmuir@gmail.com

-- 
Robert Muir
rcmuir@gmail.com

-- 
Robert Muir
rcmuir@gmail.com

Re: Why release 3.0?

Posted by Robert Muir <rc...@gmail.com>.

btw, so heres a great example. you are backwards broken regardless of JVM
for StandardTokenizer, because we used 1.4 JRE to run jflex in 2.9, but 1.5
in 3.0, right?

On Mon, Nov 16, 2009 at 2:51 PM, Robert Muir <rc...@gmail.com> wrote:

> Uwe, thats probably a good solution I think. just as long as we document
> somewhere,
> I think there is some warning verbage in StandardTokenizer already about
> this.
>
> NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate
>       the tokenizer, remember to use JRE 1.4 to run jflex (before
>       Lucene 3.0).  This grammar now uses constructs (eg :digit:,
>       :letter:) whose meaning can vary according to the JRE used to
>       run jflex.  See
>       https://issues.apache.org/jira/browse/LUCENE-1126 for details.
>
>
> On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
>>  But it is a general warning that should be placed in the Wiki: If you
>> upgrade from Java 1.4 to Java 5, think about reindexing.
>>
>>
>>
>> It has definitely nothing to do with 3.0, because uses could have changed
>> (and most of them have) before.
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>   ------------------------------
>>
>> *From:* Robert Muir [mailto:rcmuir@gmail.com]
>> *Sent:* Monday, November 16, 2009 8:45 PM
>>
>> *To:* java-dev@lucene.apache.org
>> *Subject:* Re: Why release 3.0?
>>
>>
>>
>> right, my point is its true its nothing to do with Lucene at all, really.
>>
>> but the reality is we should clarify this to users I think.
>>
>> Its especially complex in the current StandardTokenizer, which uses a mix
>> of hardcoded ranges and properties, can you tell me if you should reindex
>> for given language X?
>> I wouldn't want to answer that question right now.
>>
>> On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>>
>> We tried out: Character.getType() for these two chars:
>>
>>
>>
>> Java 5:
>> '\u00AD' = 16
>> '\u06DD' = 16
>>
>> Java 1.4:
>> '\u00AD' = 20
>> '\u06DD' = 7
>>
>>
>>
>> The first is the soft hyphen.
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>    ------------------------------
>>
>> *From:* Robert Muir [mailto:rcmuir@gmail.com]
>> *Sent:* Monday, November 16, 2009 8:37 PM
>>
>>
>> *To:* java-dev@lucene.apache.org
>> *Subject:* Re: Why release 3.0?
>>
>>
>>
>> right, its nothing to do with lucene, instead due to property changes,
>> etc.
>>
>> i just think we should inform users on java 1.4/2.9 that if they upgrade
>> to java 1.5/3.0, they should reindex.
>>
>> the reason i say this about properties, is there are some that change that
>> will affect tokenizers, i give two examples, a hyphen that changes from
>> punctuation to format (might affect SolrWordDelimiterFilter),
>> and arabic ayah which changes from NSM to format, which surely affects
>> ArabicLetterTokenizer.
>>
>> On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sa...@syr.edu> wrote:
>>
>> Hi Robert,
>>
>> I agree that the Unicode version supported by the JVM, as you say, really
>> has nothing to do with Lucene.
>>
>> The disruption here is users' upgrading from Java 1.4 to 1.5+, not when
>> they upgrade Lucene.  I'd guess with few exceptions that most people have
>> been using Lucene with 1.5+ for a couple of years now, though.
>>
>> But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on
>> most Lucene users, assuming that most use Latin-1 exclusively; although I
>> haven't looked, I'd be surprised if Latin-1 characters changed much, if at
>> all, from Unicode 3.0 to 4.0.
>>
>> It would be useful, I think, to include (a pointer to?) a description of
>> the details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release
>> notes, since the minimum required Java version, and so also the supported
>> Unicode version, changes then.
>>
>> Steve
>>
>>
>> On 11/16/2009 at 2:15 PM, Robert Muir wrote:
>> > the problem is that the properties have changed for various characters,
>> > and new characters were added.
>> >
>> > it really has nothing to do with lucene, but the idea you can go from
>> > jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
>> >
>> >
>> > On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>> >
>> >
>> >       But an UTF-8 stream from Java 4 can still be read with Java 5,
>> > what is the problem? Java 5 extended Unicode support, but an index
>> > created with older versions can still be read. UTF-8 is standardized…
>> >
>> >
>> >
>> >       -----
>> >       Uwe Schindler
>> >       H.-H.-Meier-Allee 63, D-28213 Bremen
>> >       http://www.thetaphi.de
>> >       eMail: uwe@thetaphi.de
>> >
>> >
>> > ________________________________
>> >
>> >
>> >       From: Robert Muir [mailto:rcmuir@gmail.com]
>> >       Sent: Monday, November 16, 2009 8:09 PM
>> >
>> >       To: java-dev@lucene.apache.org
>> >       Subject: Re: Why release 3.0?
>> >
>> >
>> >
>> >       uwe, on topic please read my comment on LUCENE-1689, because
>> > unicode version was bumped in jdk 1.5, i believe this index backwards
>> > compatibility is only theoretical
>> >
>> >       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uw...@thetaphi.de>
>> wrote:
>> >
>> >       2.9 has *not* the same format as 3.0, an index created with 3.0
>> > cannot be read with 2.9. This is because compressed field support was
>> > removed and therefore the version number of the stored fields file was
>> > upgraded. But indexes from 2.9 can be read with 3.0 and support may get
>> > removed in 4.0. 3.0 Indexes can be read until version 4.9.
>> >
>> >
>> >
>> >       Uwe
>> >
>> >       -----
>> >       Uwe Schindler
>> >       H.-H.-Meier-Allee 63, D-28213 Bremen
>> >       http://www.thetaphi.de
>> >       eMail: uwe@thetaphi.de
>> >
>> >
>> > ________________________________
>> >
>> >
>> >       From: Jake Mannix [mailto:jake.mannix@gmail.com]
>> >       Sent: Monday, November 16, 2009 7:15 PM
>> >
>> >
>> >       To: java-dev@lucene.apache.org
>> >
>> >       Subject: Re: Why release 3.0?
>> >
>> >
>> >
>> >       Don't users need to upgrade to 3.0 because 3.1 won't be
>> > necessarily able to read your
>> >       2.4 index file formats?  I suppose if you've already upgraded to
>> > 2.9, then all is well because
>> >       2.9 is the same format as 3.0, but we can't assume all users
>> > upgraded from 2.4 to 2.9.
>> >
>> >       If you've done that already, then 3.0 might not be necessary,
>> > but if you're on 2.4 right now,
>> >       you will be in for a bad surprise if you try to upgrade to 3.1.
>> >
>> >         -jake
>> >
>> >       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
>> > <er...@gmail.com> wrote:
>> >
>> >       One of my "specialties" is asking obvious questions just to see
>> > if everyone's assumptions are aligned. So with the discussion about
>> > branching 3.0 I have to ask "Is there going to be any 3.0 release
>> > intended for *production*?". And if not, would we save a lot of
>> > work by just not worrying about retrofitting fixes to a 3.0 branch
>> > and carrying on with 3.1 as the first *supported* 3.x release?
>> >
>> >       Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
>> > sure *as a user* I see a good reason to upgrade to 3.0. Getting a
>> > "beta/snapshot" release to get a head start on cleaning up my code
>> > does seem worthwhile, if I have the spare time. And having a base
>> > 3.0 version that's not changing all over the place would be useful
>> > for that.
>> >
>> >       That said, I'm also not terribly comfortable with a "release"
>> > that's out there and unsupported.
>> >
>> >       Apologies if this has already been discussed, but I don't
>> > remember it. Although my memory isn't what it used to be (but
>> > some would claim it never was<G>)...
>> >
>> >       Erick
>>
>>
>>
>>
>> --
>> Robert Muir
>> rcmuir@gmail.com
>>
>>
>>
>>
>> --
>> Robert Muir
>> rcmuir@gmail.com
>>
>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>



-- 
Robert Muir
rcmuir@gmail.com

Re: Why release 3.0?

Posted by Robert Muir <rc...@gmail.com>.

> Is core lucene really affected by the change? Or is it only contrib? I
> mean, if we couldn't create an index using core with surrogate pairs and
> other Unicode 4.0 stuff (though I'm not clear on the changes), how can it
> change reading/searching the index?
>
>
Sure, especially core analyzers like SimpleAnalyzer and StopAnalyzer.
Here is an example:

System.out.println(Character.isLetter('\u02C6'));

On JDK 1.4, this returns false.
On JDK 1.5, this returns true.

so, if someone indexes this character on Lucene 2.9, with java 1.4 with one
of these analyzers, then upgrades to 3.0 (they are forced to use java 1.5),
then they must reindex to get the same compat.

btw, the arguments for only 'wierd' characters, I tend to disagree, I just
searched with this character, and see many people using it in their linkedin
profiles, stuff like that (11.2M google results, who knows if all of these
are exact matches).

-- 
Robert Muir
rcmuir@gmail.com

Re: Why release 3.0?

Posted by Robert Muir <rc...@gmail.com>.

actually i thought about this. i change my story.

deprecating anything is stupid, because its still not back compatible, i.e.
Character.isLetter(char) even returns different results now, even if we
invoke it.

hard break is the only solution.

we should have done this deprecation in 2.9, but its chicken-and-egg, could
not do it because you need java 5 to support unicode 4.

On Mon, Nov 16, 2009 at 9:57 PM, Robert Muir <rc...@gmail.com> wrote:

> completely ignoring the difficulty, I would propose to fix everything to
> correspond with the java 1.5 unicode version, for consistency.
> I would exempt StandardTokenizer, because its completely inside our
> control. we can fix it at our leisure.
>
> for the rest of this stuff, its already a 'change in runtime behavior' when
> moving from 1.4 to 1.5, even though we didn't touch code.
> i would suggest making this a one-time pain for the users so they dont have
> to do it again in 3.1
> this means for CharTokenizer adding the deprecations and reflection and
> caching for the reflection that Uwe did to make TokenStream fast and work
> like this.
> and mucking with complicated i/o buffering logic as mentioned before.
>
>
> For the other side, I'll tell you what I have done in practice.
> I usually say, there is no way in hell I will refactor some existing
> codebase to support suppl. characters.
> And i find a way to isolate just chinese, support it for only that
> language, and leave the other stuff broken.
>
> I'm not really sure that is the appropriate way to go for apache lucene,
> but I felt it was fair to at least give that perspective.
> Even if we did that, the non-chinese users still need to reindex anyway,
> except for nothing (no real gain, they still don't have unicode 4 support,
> just different behavior).
>
>
> On Mon, Nov 16, 2009 at 9:47 PM, Mark Miller <ma...@gmail.com>wrote:
>
>> So whats your best recommendation? Ignoring the difficulty and just
>> considering whats best for users?
>>
>> Robert Muir wrote:
>> > well, in all honesty there is a bit of complexity.
>> > i leave the StandardTokenizer out of this, it gives the same results
>> > regardless of JVM version.
>> > it may not be correct, but its consistent, we could wait till 5.0 or
>> > 10.0 to make it correct :)
>> > Also, because it gives the same results regardless of JVM version, we
>> > can actually use the Version logic to improve it, as Uwe showed.
>> >
>> > The rest of it is where it gets nasty,
>> > Fixing the Simple/StopAnalyzer is actually the worst, because we have
>> > to deprecate the isTokenChar(char) and normalize(char) callbacks in
>> > favor of int-based versions.
>> > We also have to fix this i/o buffering logic present in for example,
>> > CharTokenizer, which just does things like refill a buffer of size
>> > 4096 without checking to ensure it doesn't break a surrogate pair.
>> >
>> > and then we have contrib...!
>> >
>> > so you see why i ask about 'index backwards compatibility', because I
>> > don't consider it actually working between 2.9->3.0 anyway, and adding
>> > that on top of fixing this stuff, and ensuring API backwards compat,
>> > that's especially nasty.
>> >
>> >
>> >
>> >     Always depends though. This double index thing you mention is
>> >     nasty (3.0
>> >     and 3.1 for the unfortunate). I'd swallow a few careful
>> >     deprecations in
>> >     3.0 to avoid that with my vote.
>> >
>> >     --
>> >     - Mark
>> >
>> >     http://www.lucidimagination.com
>> >
>> >
>> >
>> >
>> >
>> ---------------------------------------------------------------------
>> >     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >     <ma...@lucene.apache.org>
>> >     For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >     <ma...@lucene.apache.org>
>> >
>> >
>> >
>> >
>> > --
>> > Robert Muir
>> > rcmuir@gmail.com <ma...@gmail.com>
>>
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>



-- 
Robert Muir
rcmuir@gmail.com

Re: Why release 3.0?

Posted by Robert Muir <rc...@gmail.com>.

completely ignoring the difficulty, I would propose to fix everything to
correspond with the java 1.5 unicode version, for consistency.
I would exempt StandardTokenizer, because its completely inside our control.
we can fix it at our leisure.

for the rest of this stuff, its already a 'change in runtime behavior' when
moving from 1.4 to 1.5, even though we didn't touch code.
i would suggest making this a one-time pain for the users so they dont have
to do it again in 3.1
this means for CharTokenizer adding the deprecations and reflection and
caching for the reflection that Uwe did to make TokenStream fast and work
like this.
and mucking with complicated i/o buffering logic as mentioned before.

For the other side, I'll tell you what I have done in practice.
I usually say, there is no way in hell I will refactor some existing
codebase to support suppl. characters.
And i find a way to isolate just chinese, support it for only that language,
and leave the other stuff broken.

I'm not really sure that is the appropriate way to go for apache lucene, but
I felt it was fair to at least give that perspective.
Even if we did that, the non-chinese users still need to reindex anyway,
except for nothing (no real gain, they still don't have unicode 4 support,
just different behavior).

On Mon, Nov 16, 2009 at 9:47 PM, Mark Miller <ma...@gmail.com> wrote:

> So whats your best recommendation? Ignoring the difficulty and just
> considering whats best for users?
>
> Robert Muir wrote:
> > well, in all honesty there is a bit of complexity.
> > i leave the StandardTokenizer out of this, it gives the same results
> > regardless of JVM version.
> > it may not be correct, but its consistent, we could wait till 5.0 or
> > 10.0 to make it correct :)
> > Also, because it gives the same results regardless of JVM version, we
> > can actually use the Version logic to improve it, as Uwe showed.
> >
> > The rest of it is where it gets nasty,
> > Fixing the Simple/StopAnalyzer is actually the worst, because we have
> > to deprecate the isTokenChar(char) and normalize(char) callbacks in
> > favor of int-based versions.
> > We also have to fix this i/o buffering logic present in for example,
> > CharTokenizer, which just does things like refill a buffer of size
> > 4096 without checking to ensure it doesn't break a surrogate pair.
> >
> > and then we have contrib...!
> >
> > so you see why i ask about 'index backwards compatibility', because I
> > don't consider it actually working between 2.9->3.0 anyway, and adding
> > that on top of fixing this stuff, and ensuring API backwards compat,
> > that's especially nasty.
> >
> >
> >
> >     Always depends though. This double index thing you mention is
> >     nasty (3.0
> >     and 3.1 for the unfortunate). I'd swallow a few careful
> >     deprecations in
> >     3.0 to avoid that with my vote.
> >
> >     --
> >     - Mark
> >
> >     http://www.lucidimagination.com
> >
> >
> >
> >
> >     ---------------------------------------------------------------------
> >     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     For additional commands, e-mail: java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >
> >
> >
> >
> > --
> > Robert Muir
> > rcmuir@gmail.com <ma...@gmail.com>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

-- 
Robert Muir
rcmuir@gmail.com

Re: Why release 3.0?

Posted by Mark Miller <ma...@gmail.com>.

So whats your best recommendation? Ignoring the difficulty and just
considering whats best for users?

Robert Muir wrote:
> well, in all honesty there is a bit of complexity.
> i leave the StandardTokenizer out of this, it gives the same results
> regardless of JVM version.
> it may not be correct, but its consistent, we could wait till 5.0 or
> 10.0 to make it correct :)
> Also, because it gives the same results regardless of JVM version, we
> can actually use the Version logic to improve it, as Uwe showed.
>
> The rest of it is where it gets nasty,
> Fixing the Simple/StopAnalyzer is actually the worst, because we have
> to deprecate the isTokenChar(char) and normalize(char) callbacks in
> favor of int-based versions.
> We also have to fix this i/o buffering logic present in for example,
> CharTokenizer, which just does things like refill a buffer of size
> 4096 without checking to ensure it doesn't break a surrogate pair.
>
> and then we have contrib...!
>
> so you see why i ask about 'index backwards compatibility', because I
> don't consider it actually working between 2.9->3.0 anyway, and adding
> that on top of fixing this stuff, and ensuring API backwards compat,
> that's especially nasty.
>
>
>
>     Always depends though. This double index thing you mention is
>     nasty (3.0
>     and 3.1 for the unfortunate). I'd swallow a few careful
>     deprecations in
>     3.0 to avoid that with my vote.
>
>     --
>     - Mark
>
>     http://www.lucidimagination.com
>
>
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     For additional commands, e-mail: java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>
>
>
>
> -- 
> Robert Muir
> rcmuir@gmail.com <ma...@gmail.com>


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Why release 3.0?

Posted by Robert Muir <rc...@gmail.com>.

well, in all honesty there is a bit of complexity.
i leave the StandardTokenizer out of this, it gives the same results
regardless of JVM version.
it may not be correct, but its consistent, we could wait till 5.0 or 10.0 to
make it correct :)
Also, because it gives the same results regardless of JVM version, we can
actually use the Version logic to improve it, as Uwe showed.

The rest of it is where it gets nasty,
Fixing the Simple/StopAnalyzer is actually the worst, because we have to
deprecate the isTokenChar(char) and normalize(char) callbacks in favor of
int-based versions.
We also have to fix this i/o buffering logic present in for example,
CharTokenizer, which just does things like refill a buffer of size 4096
without checking to ensure it doesn't break a surrogate pair.

and then we have contrib...!

so you see why i ask about 'index backwards compatibility', because I don't
consider it actually working between 2.9->3.0 anyway, and adding that on top
of fixing this stuff, and ensuring API backwards compat,
that's especially nasty.



> Always depends though. This double index thing you mention is nasty (3.0
> and 3.1 for the unfortunate). I'd swallow a few careful deprecations in
> 3.0 to avoid that with my vote.
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Robert Muir
rcmuir@gmail.com

Re: Why release 3.0?

Posted by Mark Miller <ma...@gmail.com>.

Robert Muir wrote:
>
>
>>     and I think it sucks they might have to reindex twice with the
>>     current status of things (we did not complete unicode 4 support
>>     in lucene 3.0)
>>     which is why i mentioned this problem on the unicode 4 issues im
>>     trying to work.
>
>     Whether 3.0 goes out as it is now or with these fixes is up to the
>     voters.
>
>
> The problem is that we want 3.0 to be a 'clean' release with no
> deprecations.
> It is impossible to do so, and also have unicode 4 support in 3.0 (we
> will need to deprecate a few things)
> We couldnt do this in 2.9, because you need jdk 1.5 or icu to do even
> basic stuff like (U)Character.isLetter(int) :)
>  
>
>
Always depends though. This double index thing you mention is nasty (3.0
and 3.1 for the unfortunate). I'd swallow a few careful deprecations in
3.0 to avoid that with my vote.

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Why release 3.0?

Posted by Robert Muir <rc...@gmail.com>.

On Mon, Nov 16, 2009 at 8:17 PM, DM Smith <dm...@gmail.com> wrote:

>
>
thanks DM, I hope to work on it more soon...

>
> I've been reading the thread and at first my response was. No big deal, it
> won't affect me (i.e. awareness of the problem). And now my thought is "I'm
> hosed" (i.e. understanding)
>

I guess it depends on what characters/writing systems you are currently
using.
I think you know, this 3.0->4.0 is a pretty tough upgrade for unicode.

>
> I think we need a mechanism (I mentioned this before) to build a manifest
> of the parts of the tool chain that builds each field in an index. Then if
> any part is revisioned in a fashion that is not 100% bw compat, then we'd
> know.
>
> As it is, I'm just going to mark each index as dirty on each upgrade to
> Lucene, Java or ICU. And force a rebuild.
>

for what its worth, on an upgrade of ICU (typically minor unicode version,
at most!) I would always reindex.
This is a major unicode version upgrade.

>
> and I think it sucks they might have to reindex twice with the current
> status of things (we did not complete unicode 4 support in lucene 3.0)
> which is why i mentioned this problem on the unicode 4 issues im trying to
> work.
>
>
> Whether 3.0 goes out as it is now or with these fixes is up to the voters.
>

The problem is that we want 3.0 to be a 'clean' release with no
deprecations.
It is impossible to do so, and also have unicode 4 support in 3.0 (we will
need to deprecate a few things)
We couldnt do this in 2.9, because you need jdk 1.5 or icu to do even basic
stuff like (U)Character.isLetter(int) :)

>
>
> 2.9->3.0 (to upgrade from Unicode 3 to Unicode 4-halfass)
> 3.0->3.1 (to upgrade from Unicode 4-halfass to Unicode 4-correct)
> [hopefully]
>
>
> If this is the path, then perhaps the best advice is to skip 3.0 and take
> the pain once
>
.
>

I do not know if this is "the path", but you see how its virtually
impossible to add improvements and still guarantee any backwards
compatibility with any analysis stuff whatsoever, if it uses any JDK
functions.
Its not like TokenStream API, where its complicated yet still "under our
control". There are variables outside of lucene. This is what makes me
frustrated trying to make progress :)

>
> That's an amazing number of changes, even when you ignore name changes.
>

yeah they added over 1,000 characters!
and here is some more information in addition to the diff:
http://www.unicode.org/versions/Unicode4.0.0/

-- 
Robert Muir
rcmuir@gmail.com

Re: Why release 3.0?

Posted by Robert Muir <rc...@gmail.com>.

>
> That's an amazing number of changes, even when you ignore name changes.
>

DM, for your reference, I created another diff from 4.0->5.1, showing what
will happen with JDK7 here: http://people.apache.org/~rmuir/unicodeDiff2.txt

the problem is that as a search engine library, lucene cares about
properties and other semantics of characters that will change across
versions. so if we leave this up to the JDK, then when they upgrade unicode
it breaks back compat.

databases and other things don't much care about these properties and its
just utf8 bytes for the most part so it doesn't matter for them.

-- 
Robert Muir
rcmuir@gmail.com

Re: Why release 3.0?

Posted by DM Smith <dm...@gmail.com>.

On Nov 16, 2009, at 7:53 PM, Robert Muir wrote:

> right, the only way you could really contain it would be to do something like that.

I'm looking forward to your ICU analyzer! IMHO, it be great to have it be a pluggable replacement for it's counterparts in core. That is, using reflection, if the jar is present, then use it.

> 
> I just think we should make users aware of this, thats all. 

I've been reading the thread and at first my response was. No big deal, it won't affect me (i.e. awareness of the problem). And now my thought is "I'm hosed" (i.e. understanding).

I think we need a mechanism (I mentioned this before) to build a manifest of the parts of the tool chain that builds each field in an index. Then if any part is revisioned in a fashion that is not 100% bw compat, then we'd know.

As it is, I'm just going to mark each index as dirty on each upgrade to Lucene, Java or ICU. And force a rebuild.

> and I think it sucks they might have to reindex twice with the current status of things (we did not complete unicode 4 support in lucene 3.0)
> which is why i mentioned this problem on the unicode 4 issues im trying to work.

Whether 3.0 goes out as it is now or with these fixes is up to the voters.

> 
> 2.9->3.0 (to upgrade from Unicode 3 to Unicode 4-halfass)
> 3.0->3.1 (to upgrade from Unicode 4-halfass to Unicode 4-correct) [hopefully]

If this is the path, then perhaps the best advice is to skip 3.0 and take the pain once.

> 
> btw, i created a diff from unicode 3's UCD to unicode 4's UCD, in case you want to see the changes: http://people.apache.org/~rmuir/unicodeDiff.txt

That's an amazing number of changes, even when you ignore name changes.

> 
> On Mon, Nov 16, 2009 at 7:42 PM, DM Smith <dm...@gmail.com> wrote:
> 
> On Nov 16, 2009, at 6:43 PM, Robert Muir wrote:
> 
> > DM, in this case I'm not referring to surrogates, etc, but instead the idea that properties for an existing character can change (the soft hyphen and arabic ayah were two examples), also new characters are introduced.
> >
> > these will affect what analysis components (ex. tokenizers) do, because they like to use categories such as .isWhiteSpace, .isLetter, things like that.
> >
> > this means these components have different behavior, because they are data-driven, even though we didnt change any code.
> 
> Then why not make ICU a dependency. At least then one has control of the delivered version. Any of us that are working with texts in non latin-1 languages are likely to be using ICU anyway.
> 
> -- DM
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 
> 
> -- 
> Robert Muir
> rcmuir@gmail.com

Re: Why release 3.0?

Posted by Robert Muir <rc...@gmail.com>.

right, the only way you could really contain it would be to do something
like that.

I just think we should make users aware of this, thats all.
and I think it sucks they might have to reindex twice with the current
status of things (we did not complete unicode 4 support in lucene 3.0)
which is why i mentioned this problem on the unicode 4 issues im trying to
work.

2.9->3.0 (to upgrade from Unicode 3 to Unicode 4-halfass)
3.0->3.1 (to upgrade from Unicode 4-halfass to Unicode 4-correct)
[hopefully]

btw, i created a diff from unicode 3's UCD to unicode 4's UCD, in case you
want to see the changes: http://people.apache.org/~rmuir/unicodeDiff.txt

On Mon, Nov 16, 2009 at 7:42 PM, DM Smith <dm...@gmail.com> wrote:

>
> On Nov 16, 2009, at 6:43 PM, Robert Muir wrote:
>
> > DM, in this case I'm not referring to surrogates, etc, but instead the
> idea that properties for an existing character can change (the soft hyphen
> and arabic ayah were two examples), also new characters are introduced.
> >
> > these will affect what analysis components (ex. tokenizers) do, because
> they like to use categories such as .isWhiteSpace, .isLetter, things like
> that.
> >
> > this means these components have different behavior, because they are
> data-driven, even though we didnt change any code.
>
> Then why not make ICU a dependency. At least then one has control of the
> delivered version. Any of us that are working with texts in non latin-1
> languages are likely to be using ICU anyway.
>
> -- DM
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Robert Muir
rcmuir@gmail.com

Re: Why release 3.0?

Posted by DM Smith <dm...@gmail.com>.

On Nov 16, 2009, at 6:43 PM, Robert Muir wrote:

> DM, in this case I'm not referring to surrogates, etc, but instead the idea that properties for an existing character can change (the soft hyphen and arabic ayah were two examples), also new characters are introduced.
> 
> these will affect what analysis components (ex. tokenizers) do, because they like to use categories such as .isWhiteSpace, .isLetter, things like that.
> 
> this means these components have different behavior, because they are data-driven, even though we didnt change any code. 

Then why not make ICU a dependency. At least then one has control of the delivered version. Any of us that are working with texts in non latin-1 languages are likely to be using ICU anyway.

-- DM


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Why release 3.0?

Posted by Robert Muir <rc...@gmail.com>.

DM, in this case I'm not referring to surrogates, etc, but instead the idea
that properties for an existing character can change (the soft hyphen and
arabic ayah were two examples), also new characters are introduced.

these will affect what analysis components (ex. tokenizers) do, because they
like to use categories such as .isWhiteSpace, .isLetter, things like that.

this means these components have different behavior, because they are
data-driven, even though we didnt change any code.

On Mon, Nov 16, 2009 at 6:37 PM, DM Smith <dm...@gmail.com> wrote:

> I'm not sure that anyone is forced to go to Java 5. I think it is more that
> some will be stuck on Java 1.4. My guess is that other than those that are
> on a very old version of MacOSX (i.e. 10.3 aka Panther, Oct 2003-Apr 2005)
> everyone else is using Java 5 or Java 6 already.
>
> Is core lucene really affected by the change? Or is it only contrib? I
> mean, if we couldn't create an index using core with surrogate pairs and
> other Unicode 4.0 stuff (though I'm not clear on the changes), how can it
> change reading/searching the index?
>
> -- DM
>
> On Nov 16, 2009, at 4:36 PM, Robert Muir wrote:
>
> this fixes the standardTokenizer (thanks!!), but thats different, because
> its not dependent on the end users JVM.
>
> i think we still need a warning to users, Mark opened an issue about it,
> because other tokenizers are dependent on the end users JVM, and we are
> forcing them to upgrade to 1.5
>
> On Mon, Nov 16, 2009 at 4:33 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
>> I opened https://issues.apache.org/jira/browse/LUCENE-2074
>>
>> It fixes the problem, the patch uses a different impl depending on
>> matchVersion.
>>
>> If I commit it now, I would regenerate the rc1 artifacts and release the
>> tomorrow to java-user. Currently the ones on people.apache.org are only
>> "known" to java-dev users.
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>> > -----Original Message-----
>> > From: Uwe Schindler [mailto:uwe@thetaphi.de]
>> > Sent: Monday, November 16, 2009 9:59 PM
>> > To: java-dev@lucene.apache.org
>> > Subject: RE: Why release 3.0?
>> >
>> > OK, I checked. The JFLEX file in tunk was 1.4 generated. I regenerated
>> > with
>> > 1.5 and it was different (completely!). I saved the old version and
>> > renamed
>> > to StandardTokenizerImplJava14 extends StandardTokenizerImpl
>> >
>> > By this the impl is exchanged depending on version. The 1.4 version can
>> no
>> > longer be regenerated because it has no .jflex file and should really
>> > never
>> > be regenerated.
>> >
>> > -----
>> > Uwe Schindler
>> > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > http://www.thetaphi.de
>> > eMail: uwe@thetaphi.de
>> >
>> > > -----Original Message-----
>> > > From: Mark Miller [mailto:markrmiller@gmail.com]
>> > > Sent: Monday, November 16, 2009 9:45 PM
>> > > To: java-dev@lucene.apache.org
>> > > Subject: Re: Why release 3.0?
>> > >
>> > > I still reccomend we add a file then HowToRegenJflex.txt or something
>> -
>> > > that specifically says to use 1.5 or 1.6. I don't changing the current
>> > > notice/warning is visible enough to ensure someone doesn't break this.
>> > >
>> > > Robert Muir wrote:
>> > > > no. its still 4.0, but i hear 1.7 will be 5.1 or 5.2
>> > > >
>> > > > the only way to truly control this, would be to use something like
>> ICU
>> > > > to control the unicode version being used (and actually be faster,
>> and
>> > > > support higher version).
>> > > > see http://site.icu-project.org/home/why-use-icu4j
>> > > >
>> > > > the issue is that lucene does not have 3rd party library
>> dependencies,
>> > > > on the other hand, i think tika and/or nutch already incorporate icu
>> > > > for charset detection.
>> > > >
>> > > > i won't argue for this really, i know nobody wants it, but you can
>> see
>> > > > how the situation of not being able to control unicode semantics is
>> > > > really difficult for a search engine.
>> > > >
>> > > > On Mon, Nov 16, 2009 at 3:33 PM, Uwe Schindler <
>> uschindler@pangaea.de
>> > > > <ma...@pangaea.de>> wrote:
>> > > >
>> > > >     Did 1.6 change the unicode version? Robert?
>> > > >
>> > > >     -----
>> > > >     UWE SCHINDLER
>> > > >     Webserver/Middleware Development
>> > > >     PANGAEA - Publishing Network for Geoscientific and Environmental
>> > > Data
>> > > >     MARUM - University of Bremen
>> > > >     Room 2500, Leobener Str., D-28359 Bremen
>> > > >     Tel.: +49 421 218 65595
>> > > >     Fax:  +49 421 218 65505
>> > > >     http://www.pangaea.de/
>> > > >     E-mail <http://www.pangaea.de/%0AE-mail>: uschindler@pangaea.de
>> > > >     <ma...@pangaea.de>
>> > > >
>> > > >     > -----Original Message-----
>> > > >     > From: Mark Miller [mailto:markrmiller@gmail.com
>> > > >     <ma...@gmail.com>]
>> > > >     > Sent: Monday, November 16, 2009 9:30 PM
>> > > >     > To: java-dev@lucene.apache.org <mailto:java-
>> > dev@lucene.apache.org>
>> > > >     > Subject: Re: Why release 3.0?
>> > > >     >
>> > > >     > And what happens when someone regenerates it with 1.6 without
>> > > >     knowing?
>> > > >     >
>> > > >     > Uwe Schindler wrote:
>> > > >     > > I check this by generating the file with 1.4 and 1.5. The
>> 1.4
>> > > >     version
>> > > >     > will
>> > > >     > > not change anymore, so we just leave the java file no jflex
>> > > >     anymore. The
>> > > >     > old
>> > > >     > > one is used for Lucene until 2.9, if you use
>> > > >     matchVersion=LUCENE_30, the
>> > > >     > new
>> > > >     > > one is used, which can also be regenerated.
>> > > >     > >
>> > > >     > > -----
>> > > >     > > Uwe Schindler
>> > > >     > > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > > >     > > http://www.thetaphi.de
>> > > >     > > eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>> > > >     > >
>> > > >     > >
>> > > >     > >> -----Original Message-----
>> > > >     > >> From: Mark Miller [mailto:markrmiller@gmail.com
>> > > >     <ma...@gmail.com>]
>> > > >     > >> Sent: Monday, November 16, 2009 9:21 PM
>> > > >     > >> To: java-dev@lucene.apache.org
>> > > >     <ma...@lucene.apache.org>
>> > > >     > >> Subject: Re: Why release 3.0?
>> > > >     > >>
>> > > >     > >> Good point - and that likely means the current warning is
>> not
>> > > >     working -
>> > > >     > >> what can we do to improve it?
>> > > >     > >>
>> > > >     > >> Perhaps a new text file called jflexregen or something, and
>> > it
>> > > >     > >> specifically says you must use java 1.5?
>> > > >     > >>
>> > > >     > >> Uwe Schindler wrote:
>> > > >     > >>
>> > > >     > >>> I think the regenerated code in Standard is since years no
>> > > >     longer
>> > > >     > >>> generated with 1.4 J Most developers use 1.5 or even 1.6.
>> So
>> > > it
>> > > >     > >>> already changed incompatible.
>> > > >     > >>>
>> > > >     > >>>
>> > > >     > >>>
>> > > >     > >>> -----
>> > > >     > >>> Uwe Schindler
>> > > >     > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> > > >     > >>> http://www.thetaphi.de
>> > > >     > >>> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>> > > >     > >>>
>> > > >     > >>>
>> > > >
>> ------------------------------------------------------------------
>> > --
>> > > --
>> > > >     > --
>> > > >     > >>>
>> > > >     > >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
>> > > >     <ma...@gmail.com>]
>> > > >     > >>> *Sent:* Monday, November 16, 2009 8:52 PM
>> > > >     > >>> *To:* java-dev@lucene.apache.org
>> > > >     <ma...@lucene.apache.org>
>> > > >     > >>> *Subject:* Re: Why release 3.0?
>> > > >     > >>>
>> > > >     > >>>
>> > > >     > >>>
>> > > >     > >>> Uwe, thats probably a good solution I think. just as long
>> as
>> > > we
>> > > >     > >>> document somewhere,
>> > > >     > >>> I think there is some warning verbage in StandardTokenizer
>> > > >     already
>> > > >     > >>> about this.
>> > > >     > >>>
>> > > >     > >>> NOTE: if you change StandardTokenizerImpl.jflex and need
>> to
>> > > >     regenerate
>> > > >     > >>>       the tokenizer, remember to use JRE 1.4 to run jflex
>> > > >     (before
>> > > >     > >>>       Lucene 3.0).  This grammar now uses constructs (eg
>> > > >     :digit:,
>> > > >     > >>>       :letter:) whose meaning can vary according to the
>> JRE
>> > > >     used to
>> > > >     > >>>       run jflex.  See
>> > > >     > >>>       https://issues.apache.org/jira/browse/LUCENE-1126for
>> > > >     details.
>> > > >     > >>>
>> > > >     > >>> On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler
>> > > >     <uwe@thetaphi.de <ma...@thetaphi.de>
>> > > >     > >>> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
>> > > >     > >>>
>> > > >     > >>> But it is a general warning that should be placed in the
>> > > >     Wiki: If you
>> > > >     > >>> upgrade from Java 1.4 to Java 5, think about reindexing.
>> > > >     > >>>
>> > > >     > >>>
>> > > >     > >>>
>> > > >     > >>> It has definitely nothing to do with 3.0, because uses
>> could
>> > > >     have
>> > > >     > >>> changed (and most of them have) before.
>> > > >     > >>>
>> > > >     > >>> -----
>> > > >     > >>> Uwe Schindler
>> > > >     > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> > > >     > >>> http://www.thetaphi.de
>> > > >     > >>> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>> > > >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
>> > > >     > >>>
>> > > >     > >>>
>> > > >
>> ------------------------------------------------------------------
>> > --
>> > > --
>> > > >     > --
>> > > >     > >>>
>> > > >     > >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
>> > > >     <ma...@gmail.com>
>> > > >     > <mailto:rcmuir@gmail.com <ma...@gmail.com>>]
>> > > >     > >>> *Sent:* Monday, November 16, 2009 8:45 PM
>> > > >     > >>>
>> > > >     > >>>
>> > > >     > >>> *To:* java-dev@lucene.apache.org
>> > > >     <ma...@lucene.apache.org>
>> > > >     <mailto:java-dev@lucene.apache.org
>> > > >     <ma...@lucene.apache.org>>
>> > > >     > >>> *Subject:* Re: Why release 3.0?
>> > > >     > >>>
>> > > >     > >>>
>> > > >     > >>>
>> > > >     > >>> right, my point is its true its nothing to do with Lucene
>> at
>> > > >     all,
>> > > >     > >>>
>> > > >     > >> really.
>> > > >     > >>
>> > > >     > >>> but the reality is we should clarify this to users I
>> think.
>> > > >     > >>>
>> > > >     > >>> Its especially complex in the current StandardTokenizer,
>> > > >     which uses a
>> > > >     > >>> mix of hardcoded ranges and properties, can you tell me if
>> > > >     you should
>> > > >     > >>> reindex for given language X?
>> > > >     > >>> I wouldn't want to answer that question right now.
>> > > >     > >>>
>> > > >     > >>> On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler
>> > > >     <uwe@thetaphi.de <ma...@thetaphi.de>
>> > > >     > >>> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
>> > > >     > >>>
>> > > >     > >>> We tried out: Character.getType() for these two chars:
>> > > >     > >>>
>> > > >     > >>>
>> > > >     > >>>
>> > > >     > >>> Java 5:
>> > > >     > >>> '\u00AD' = 16
>> > > >     > >>> '\u06DD' = 16
>> > > >     > >>>
>> > > >     > >>> Java 1.4:
>> > > >     > >>> '\u00AD' = 20
>> > > >     > >>> '\u06DD' = 7
>> > > >     > >>>
>> > > >     > >>>
>> > > >     > >>>
>> > > >     > >>> The first is the soft hyphen.
>> > > >     > >>>
>> > > >     > >>> -----
>> > > >     > >>> Uwe Schindler
>> > > >     > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> > > >     > >>> http://www.thetaphi.de
>> > > >     > >>> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>> > > >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
>> > > >     > >>>
>> > > >     > >>>
>> > > >
>> ------------------------------------------------------------------
>> > --
>> > > --
>> > > >     > --
>> > > >     > >>>
>> > > >     > >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
>> > > >     <ma...@gmail.com>
>> > > >     > <mailto:rcmuir@gmail.com <ma...@gmail.com>>]
>> > > >     > >>> *Sent:* Monday, November 16, 2009 8:37 PM
>> > > >     > >>>
>> > > >     > >>>
>> > > >     > >>> *To:* java-dev@lucene.apache.org
>> > > >     <ma...@lucene.apache.org>
>> > > >     <mailto:java-dev@lucene.apache.org
>> > > >     <ma...@lucene.apache.org>>
>> > > >     > >>> *Subject:* Re: Why release 3.0?
>> > > >     > >>>
>> > > >     > >>>
>> > > >     > >>>
>> > > >     > >>> right, its nothing to do with lucene, instead due to
>> > > >     property changes,
>> > > >     > >>> etc.
>> > > >     > >>>
>> > > >     > >>> i just think we should inform users on java 1.4/2.9 that
>> if
>> > > they
>> > > >     > >>> upgrade to java 1.5/3.0, they should reindex.
>> > > >     > >>>
>> > > >     > >>> the reason i say this about properties, is there are some
>> > > >     that change
>> > > >     > >>> that will affect tokenizers, i give two examples, a hyphen
>> > > that
>> > > >     > >>> changes from punctuation to format (might affect
>> > > >     > >>>
>> > > >     > >> SolrWordDelimiterFilter),
>> > > >     > >>
>> > > >     > >>> and arabic ayah which changes from NSM to format, which
>> > > >     surely affects
>> > > >     > >>> ArabicLetterTokenizer.
>> > > >     > >>>
>> > > >     > >>> On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe
>> > > >     <sarowe@syr.edu <ma...@syr.edu>
>> > > >     > >>> <mailto:sarowe@syr.edu <ma...@syr.edu>>> wrote:
>> > > >     > >>>
>> > > >     > >>> Hi Robert,
>> > > >     > >>>
>> > > >     > >>> I agree that the Unicode version supported by the JVM, as
>> > > >     you say,
>> > > >     > >>> really has nothing to do with Lucene.
>> > > >     > >>>
>> > > >     > >>> The disruption here is users' upgrading from Java 1.4 to
>> > > >     1.5+, not
>> > > >     > >>> when they upgrade Lucene.  I'd guess with few exceptions
>> > > >     that most
>> > > >     > >>> people have been using Lucene with 1.5+ for a couple of
>> > > >     years now,
>> > > >     > >>>
>> > > >     > >> though.
>> > > >     > >>
>> > > >     > >>> But even the upgrade from Java 1.4 to 1.5+ will have (had)
>> > > >     zero impact
>> > > >     > >>> on most Lucene users, assuming that most use Latin-1
>> > > >     exclusively;
>> > > >     > >>> although I haven't looked, I'd be surprised if Latin-1
>> > > >     characters
>> > > >     > >>> changed much, if at all, from Unicode 3.0 to 4.0.
>> > > >     > >>>
>> > > >     > >>> It would be useful, I think, to include (a pointer to?) a
>> > > >     description
>> > > >     > >>> of the details of the Unicode 3.0->4.0 differences in the
>> > > >     Lucene 3.0
>> > > >     > >>> release notes, since the minimum required Java version,
>> and
>> > > >     so also
>> > > >     > >>> the supported Unicode version, changes then.
>> > > >     > >>>
>> > > >     > >>> Steve
>> > > >     > >>>
>> > > >     > >>>
>> > > >     > >>> On 11/16/2009 at 2:15 PM, Robert Muir wrote:
>> > > >     > >>>
>> > > >     > >>>> the problem is that the properties have changed for
>> various
>> > > >     > >>>>
>> > > >     > >> characters,
>> > > >     > >>
>> > > >     > >>>> and new characters were added.
>> > > >     > >>>>
>> > > >     > >>>> it really has nothing to do with lucene, but the idea you
>> > > >     can go from
>> > > >     > >>>> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without
>> reindexing
>> > > >     is not
>> > > >     > >>>>
>> > > >     > >> true.
>> > > >     > >>
>> > > >     > >>>> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler
>> > > >     <uwe@thetaphi.de <ma...@thetaphi.de>
>> > > >     > >>>>
>> > > >     > >>> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
>> > > >     > >>>
>> > > >     > >>>>       But an UTF-8 stream from Java 4 can still be read
>> > > >     with Java 5,
>> > > >     > >>>> what is the problem? Java 5 extended Unicode support, but
>> > > >     an index
>> > > >     > >>>> created with older versions can still be read. UTF-8 is
>> > > >     standardized.
>> > > >     > >>>>
>> > > >     > >>>>
>> > > >     > >>>>
>> > > >     > >>>>       -----
>> > > >     > >>>>       Uwe Schindler
>> > > >     > >>>>       H.-H.-Meier-Allee 63, D-28213 Bremen
>> > > >     > >>>>       http://www.thetaphi.de
>> > > >     > >>>>       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>> > > >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
>> > > >     > >>>>
>> > > >     > >>>>
>> > > >     > >>>> ________________________________
>> > > >     > >>>>
>> > > >     > >>>>
>> > > >     > >>>>       From: Robert Muir [mailto:rcmuir@gmail.com
>> > > >     <ma...@gmail.com>
>> > > >     > >>>>
>> > > >     > >>> <mailto:rcmuir@gmail.com <ma...@gmail.com>>]
>> > > >     > >>>
>> > > >     > >>>>       Sent: Monday, November 16, 2009 8:09 PM
>> > > >     > >>>>
>> > > >     > >>>>       To: java-dev@lucene.apache.org
>> > > >     <ma...@lucene.apache.org> <mailto:java- <mailto:java-
>> >
>> > > >     > >>>>
>> > > >     > >> dev@lucene.apache.org <ma...@lucene.apache.org>>
>> > > >     > >>
>> > > >     > >>>>       Subject: Re: Why release 3.0?
>> > > >     > >>>>
>> > > >     > >>>>
>> > > >     > >>>>
>> > > >     > >>>>       uwe, on topic please read my comment on
>> LUCENE-1689,
>> > > >     because
>> > > >     > >>>> unicode version was bumped in jdk 1.5, i believe this
>> index
>> > > >     backwards
>> > > >     > >>>> compatibility is only theoretical
>> > > >     > >>>>
>> > > >     > >>>>       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler
>> > > >     <uwe@thetaphi.de <ma...@thetaphi.de>
>> > > >     > >>>>
>> > > >     > >>> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
>> > > >     > >>>
>> > > >     > >>>>       2.9 has *not* the same format as 3.0, an index
>> > > >     created with 3.0
>> > > >     > >>>> cannot be read with 2.9. This is because compressed field
>> > > >     support was
>> > > >     > >>>> removed and therefore the version number of the stored
>> > > >     fields file
>> > > >     > was
>> > > >     > >>>> upgraded. But indexes from 2.9 can be read with 3.0 and
>> > > >     support may
>> > > >     > >>>>
>> > > >     > >> get
>> > > >     > >>
>> > > >     > >>>> removed in 4.0. 3.0 Indexes can be read until version
>> 4.9.
>> > > >     > >>>>
>> > > >     > >>>>
>> > > >     > >>>>
>> > > >     > >>>>       Uwe
>> > > >     > >>>>
>> > > >     > >>>>       -----
>> > > >     > >>>>       Uwe Schindler
>> > > >     > >>>>       H.-H.-Meier-Allee 63, D-28213 Bremen
>> > > >     > >>>>       http://www.thetaphi.de
>> > > >     > >>>>       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>> > > >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
>> > > >     > >>>>
>> > > >     > >>>>
>> > > >     > >>>> ________________________________
>> > > >     > >>>>
>> > > >     > >>>>
>> > > >     > >>>>       From: Jake Mannix [mailto:jake.mannix@gmail.com
>> > > >     <ma...@gmail.com>
>> > > >     > >>>>
>> > > >     > >>> <mailto:jake.mannix@gmail.com
>> > <ma...@gmail.com>>]
>> > > >     > >>>
>> > > >     > >>>>       Sent: Monday, November 16, 2009 7:15 PM
>> > > >     > >>>>
>> > > >     > >>>>
>> > > >     > >>>>       To: java-dev@lucene.apache.org
>> > > >     <ma...@lucene.apache.org> <mailto:java- <mailto:java-
>> >
>> > > >     > >>>>
>> > > >     > >> dev@lucene.apache.org <ma...@lucene.apache.org>>
>> > > >     > >>
>> > > >     > >>>>       Subject: Re: Why release 3.0?
>> > > >     > >>>>
>> > > >     > >>>>
>> > > >     > >>>>
>> > > >     > >>>>       Don't users need to upgrade to 3.0 because 3.1
>> won't
>> > be
>> > > >     > >>>> necessarily able to read your
>> > > >     > >>>>       2.4 index file formats?  I suppose if you've
>> already
>> > > >     upgraded
>> > > >     > to
>> > > >     > >>>> 2.9, then all is well because
>> > > >     > >>>>       2.9 is the same format as 3.0, but we can't assume
>> > > >     all users
>> > > >     > >>>> upgraded from 2.4 to 2.9.
>> > > >     > >>>>
>> > > >     > >>>>       If you've done that already, then 3.0 might not be
>> > > >     necessary,
>> > > >     > >>>> but if you're on 2.4 right now,
>> > > >     > >>>>       you will be in for a bad surprise if you try to
>> > > >     upgrade to 3.1.
>> > > >     > >>>>
>> > > >     > >>>>         -jake
>> > > >     > >>>>
>> > > >     > >>>>       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
>> > > >     > >>>> <erickerickson@gmail.com <mailto:erickerickson@gmail.com
>> >
>> > > >     <mailto:erickerickson@gmail.com <mailto:erickerickson@gmail.com
>> >>>
>> > > >     wrote:
>> > > >     > >>>>
>> > > >     > >>>>       One of my "specialties" is asking obvious questions
>> > > >     just to see
>> > > >     > >>>> if everyone's assumptions are aligned. So with the
>> > > >     discussion about
>> > > >     > >>>> branching 3.0 I have to ask "Is there going to be any 3.0
>> > > >     release
>> > > >     > >>>> intended for *production*?". And if not, would we save a
>> > lot
>> > > of
>> > > >     > >>>> work by just not worrying about retrofitting fixes to a
>> 3.0
>> > > >     branch
>> > > >     > >>>> and carrying on with 3.1 as the first *supported* 3.x
>> > > release?
>> > > >     > >>>>
>> > > >     > >>>>       Since 3.0 is "upgrade-to-java5 and remove
>> > > >     deprecations", I'm
>> > > >     > not
>> > > >     > >>>> sure *as a user* I see a good reason to upgrade to 3.0.
>> > > >     Getting a
>> > > >     > >>>> "beta/snapshot" release to get a head start on cleaning
>> up
>> > > >     my code
>> > > >     > >>>> does seem worthwhile, if I have the spare time. And
>> having
>> > > >     a base
>> > > >     > >>>> 3.0 version that's not changing all over the place would
>> be
>> > > >     useful
>> > > >     > >>>> for that.
>> > > >     > >>>>
>> > > >     > >>>>       That said, I'm also not terribly comfortable with a
>> > > >     "release"
>> > > >     > >>>> that's out there and unsupported.
>> > > >     > >>>>
>> > > >     > >>>>       Apologies if this has already been discussed, but I
>> > > don't
>> > > >     > >>>> remember it. Although my memory isn't what it used to be
>> > (but
>> > > >     > >>>> some would claim it never was<G>)...
>> > > >     > >>>>
>> > > >     > >>>>       Erick
>> > > >     > >>>>
>> > > >     > >>>
>> > > >     > >>>
>> > > >     > >>> --
>> > > >     > >>> Robert Muir
>> > > >     > >>> rcmuir@gmail.com <ma...@gmail.com>
>> > > >     <mailto:rcmuir@gmail.com <ma...@gmail.com>>
>> > > >     > >>>
>> > > >     > >>>
>> > > >     > >>>
>> > > >     > >>>
>> > > >     > >>> --
>> > > >     > >>> Robert Muir
>> > > >     > >>> rcmuir@gmail.com <ma...@gmail.com>
>> > > >     <mailto:rcmuir@gmail.com <ma...@gmail.com>>
>> > > >     > >>>
>> > > >     > >>>
>> > > >     > >>>
>> > > >     > >>>
>> > > >     > >>> --
>> > > >     > >>> Robert Muir
>> > > >     > >>> rcmuir@gmail.com <ma...@gmail.com>
>> > > >     <mailto:rcmuir@gmail.com <ma...@gmail.com>>
>> > > >     > >>>
>> > > >     > >>>
>> > > >     > >> --
>> > > >     > >> - Mark
>> > > >     > >>
>> > > >     > >> http://www.lucidimagination.com
>> > > >     > >>
>> > > >     > >>
>> > > >     > >>
>> > > >     > >>
>> > > >     > >>
>> > > >
>> ------------------------------------------------------------------
>> > --
>> > > -
>> > > >     > >> To unsubscribe, e-mail:
>> > > >     java-dev-unsubscribe@lucene.apache.org
>> > > >     <ma...@lucene.apache.org>
>> > > >     > >> For additional commands, e-mail:
>> > > >     java-dev-help@lucene.apache.org
>> > > >     <ma...@lucene.apache.org>
>> > > >     > >>
>> > > >     > >
>> > > >     > >
>> > > >     > >
>> > > >     > >
>> > > >
>> ------------------------------------------------------------------
>> > --
>> > > -
>> > > >     > > To unsubscribe, e-mail:
>> java-dev-unsubscribe@lucene.apache.org
>> > > >     <ma...@lucene.apache.org>
>> > > >     > > For additional commands, e-mail:
>> > > >     java-dev-help@lucene.apache.org
>> > > >     <ma...@lucene.apache.org>
>> > > >     > >
>> > > >     > >
>> > > >     >
>> > > >     >
>> > > >     > --
>> > > >     > - Mark
>> > > >     >
>> > > >     > http://www.lucidimagination.com
>> > > >     >
>> > > >     >
>> > > >     >
>> > > >     >
>> > > >     >
>> > > >
>> ------------------------------------------------------------------
>> > --
>> > > -
>> > > >     > To unsubscribe, e-mail:
>> java-dev-unsubscribe@lucene.apache.org
>> > > >     <ma...@lucene.apache.org>
>> > > >     > For additional commands, e-mail:
>> java-dev-help@lucene.apache.org
>> > > >     <ma...@lucene.apache.org>
>> > > >
>> > > >
>> > > >
>> > > >
>> ------------------------------------------------------------------
>> > --
>> > > -
>> > > >     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> > > >     <ma...@lucene.apache.org>
>> > > >     For additional commands, e-mail:
>> java-dev-help@lucene.apache.org
>> > > >     <ma...@lucene.apache.org>
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Robert Muir
>> > > > rcmuir@gmail.com <ma...@gmail.com>
>> > >
>> > >
>> > > --
>> > > - Mark
>> > >
>> > > http://www.lucidimagination.com
>> > >
>> > >
>> > >
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> > > For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>
>
>


-- 
Robert Muir
rcmuir@gmail.com

Re: Why release 3.0?

Posted by DM Smith <dm...@gmail.com>.

I'm not sure that anyone is forced to go to Java 5. I think it is more that some will be stuck on Java 1.4. My guess is that other than those that are on a very old version of MacOSX (i.e. 10.3 aka Panther, Oct 2003-Apr 2005) everyone else is using Java 5 or Java 6 already.

Is core lucene really affected by the change? Or is it only contrib? I mean, if we couldn't create an index using core with surrogate pairs and other Unicode 4.0 stuff (though I'm not clear on the changes), how can it change reading/searching the index?

-- DM

On Nov 16, 2009, at 4:36 PM, Robert Muir wrote:

> this fixes the standardTokenizer (thanks!!), but thats different, because its not dependent on the end users JVM.
> 
> i think we still need a warning to users, Mark opened an issue about it, because other tokenizers are dependent on the end users JVM, and we are forcing them to upgrade to 1.5
> 
> On Mon, Nov 16, 2009 at 4:33 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> I opened https://issues.apache.org/jira/browse/LUCENE-2074
> 
> It fixes the problem, the patch uses a different impl depending on
> matchVersion.
> 
> If I commit it now, I would regenerate the rc1 artifacts and release the
> tomorrow to java-user. Currently the ones on people.apache.org are only
> "known" to java-dev users.
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> 
> > -----Original Message-----
> > From: Uwe Schindler [mailto:uwe@thetaphi.de]
> > Sent: Monday, November 16, 2009 9:59 PM
> > To: java-dev@lucene.apache.org
> > Subject: RE: Why release 3.0?
> >
> > OK, I checked. The JFLEX file in tunk was 1.4 generated. I regenerated
> > with
> > 1.5 and it was different (completely!). I saved the old version and
> > renamed
> > to StandardTokenizerImplJava14 extends StandardTokenizerImpl
> >
> > By this the impl is exchanged depending on version. The 1.4 version can no
> > longer be regenerated because it has no .jflex file and should really
> > never
> > be regenerated.
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> > > -----Original Message-----
> > > From: Mark Miller [mailto:markrmiller@gmail.com]
> > > Sent: Monday, November 16, 2009 9:45 PM
> > > To: java-dev@lucene.apache.org
> > > Subject: Re: Why release 3.0?
> > >
> > > I still reccomend we add a file then HowToRegenJflex.txt or something -
> > > that specifically says to use 1.5 or 1.6. I don't changing the current
> > > notice/warning is visible enough to ensure someone doesn't break this.
> > >
> > > Robert Muir wrote:
> > > > no. its still 4.0, but i hear 1.7 will be 5.1 or 5.2
> > > >
> > > > the only way to truly control this, would be to use something like ICU
> > > > to control the unicode version being used (and actually be faster, and
> > > > support higher version).
> > > > see http://site.icu-project.org/home/why-use-icu4j
> > > >
> > > > the issue is that lucene does not have 3rd party library dependencies,
> > > > on the other hand, i think tika and/or nutch already incorporate icu
> > > > for charset detection.
> > > >
> > > > i won't argue for this really, i know nobody wants it, but you can see
> > > > how the situation of not being able to control unicode semantics is
> > > > really difficult for a search engine.
> > > >
> > > > On Mon, Nov 16, 2009 at 3:33 PM, Uwe Schindler <uschindler@pangaea.de
> > > > <ma...@pangaea.de>> wrote:
> > > >
> > > >     Did 1.6 change the unicode version? Robert?
> > > >
> > > >     -----
> > > >     UWE SCHINDLER
> > > >     Webserver/Middleware Development
> > > >     PANGAEA - Publishing Network for Geoscientific and Environmental
> > > Data
> > > >     MARUM - University of Bremen
> > > >     Room 2500, Leobener Str., D-28359 Bremen
> > > >     Tel.: +49 421 218 65595
> > > >     Fax:  +49 421 218 65505
> > > >     http://www.pangaea.de/
> > > >     E-mail <http://www.pangaea.de/%0AE-mail>: uschindler@pangaea.de
> > > >     <ma...@pangaea.de>
> > > >
> > > >     > -----Original Message-----
> > > >     > From: Mark Miller [mailto:markrmiller@gmail.com
> > > >     <ma...@gmail.com>]
> > > >     > Sent: Monday, November 16, 2009 9:30 PM
> > > >     > To: java-dev@lucene.apache.org <mailto:java-
> > dev@lucene.apache.org>
> > > >     > Subject: Re: Why release 3.0?
> > > >     >
> > > >     > And what happens when someone regenerates it with 1.6 without
> > > >     knowing?
> > > >     >
> > > >     > Uwe Schindler wrote:
> > > >     > > I check this by generating the file with 1.4 and 1.5. The 1.4
> > > >     version
> > > >     > will
> > > >     > > not change anymore, so we just leave the java file no jflex
> > > >     anymore. The
> > > >     > old
> > > >     > > one is used for Lucene until 2.9, if you use
> > > >     matchVersion=LUCENE_30, the
> > > >     > new
> > > >     > > one is used, which can also be regenerated.
> > > >     > >
> > > >     > > -----
> > > >     > > Uwe Schindler
> > > >     > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > >     > > http://www.thetaphi.de
> > > >     > > eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> > > >     > >
> > > >     > >
> > > >     > >> -----Original Message-----
> > > >     > >> From: Mark Miller [mailto:markrmiller@gmail.com
> > > >     <ma...@gmail.com>]
> > > >     > >> Sent: Monday, November 16, 2009 9:21 PM
> > > >     > >> To: java-dev@lucene.apache.org
> > > >     <ma...@lucene.apache.org>
> > > >     > >> Subject: Re: Why release 3.0?
> > > >     > >>
> > > >     > >> Good point - and that likely means the current warning is not
> > > >     working -
> > > >     > >> what can we do to improve it?
> > > >     > >>
> > > >     > >> Perhaps a new text file called jflexregen or something, and
> > it
> > > >     > >> specifically says you must use java 1.5?
> > > >     > >>
> > > >     > >> Uwe Schindler wrote:
> > > >     > >>
> > > >     > >>> I think the regenerated code in Standard is since years no
> > > >     longer
> > > >     > >>> generated with 1.4 J Most developers use 1.5 or even 1.6. So
> > > it
> > > >     > >>> already changed incompatible.
> > > >     > >>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> -----
> > > >     > >>> Uwe Schindler
> > > >     > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> > > >     > >>> http://www.thetaphi.de
> > > >     > >>> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> > > >     > >>>
> > > >     > >>>
> > > >     ------------------------------------------------------------------
> > --
> > > --
> > > >     > --
> > > >     > >>>
> > > >     > >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
> > > >     <ma...@gmail.com>]
> > > >     > >>> *Sent:* Monday, November 16, 2009 8:52 PM
> > > >     > >>> *To:* java-dev@lucene.apache.org
> > > >     <ma...@lucene.apache.org>
> > > >     > >>> *Subject:* Re: Why release 3.0?
> > > >     > >>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> Uwe, thats probably a good solution I think. just as long as
> > > we
> > > >     > >>> document somewhere,
> > > >     > >>> I think there is some warning verbage in StandardTokenizer
> > > >     already
> > > >     > >>> about this.
> > > >     > >>>
> > > >     > >>> NOTE: if you change StandardTokenizerImpl.jflex and need to
> > > >     regenerate
> > > >     > >>>       the tokenizer, remember to use JRE 1.4 to run jflex
> > > >     (before
> > > >     > >>>       Lucene 3.0).  This grammar now uses constructs (eg
> > > >     :digit:,
> > > >     > >>>       :letter:) whose meaning can vary according to the JRE
> > > >     used to
> > > >     > >>>       run jflex.  See
> > > >     > >>>       https://issues.apache.org/jira/browse/LUCENE-1126 for
> > > >     details.
> > > >     > >>>
> > > >     > >>> On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler
> > > >     <uwe@thetaphi.de <ma...@thetaphi.de>
> > > >     > >>> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
> > > >     > >>>
> > > >     > >>> But it is a general warning that should be placed in the
> > > >     Wiki: If you
> > > >     > >>> upgrade from Java 1.4 to Java 5, think about reindexing.
> > > >     > >>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> It has definitely nothing to do with 3.0, because uses could
> > > >     have
> > > >     > >>> changed (and most of them have) before.
> > > >     > >>>
> > > >     > >>> -----
> > > >     > >>> Uwe Schindler
> > > >     > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> > > >     > >>> http://www.thetaphi.de
> > > >     > >>> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> > > >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
> > > >     > >>>
> > > >     > >>>
> > > >     ------------------------------------------------------------------
> > --
> > > --
> > > >     > --
> > > >     > >>>
> > > >     > >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
> > > >     <ma...@gmail.com>
> > > >     > <mailto:rcmuir@gmail.com <ma...@gmail.com>>]
> > > >     > >>> *Sent:* Monday, November 16, 2009 8:45 PM
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> *To:* java-dev@lucene.apache.org
> > > >     <ma...@lucene.apache.org>
> > > >     <mailto:java-dev@lucene.apache.org
> > > >     <ma...@lucene.apache.org>>
> > > >     > >>> *Subject:* Re: Why release 3.0?
> > > >     > >>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> right, my point is its true its nothing to do with Lucene at
> > > >     all,
> > > >     > >>>
> > > >     > >> really.
> > > >     > >>
> > > >     > >>> but the reality is we should clarify this to users I think.
> > > >     > >>>
> > > >     > >>> Its especially complex in the current StandardTokenizer,
> > > >     which uses a
> > > >     > >>> mix of hardcoded ranges and properties, can you tell me if
> > > >     you should
> > > >     > >>> reindex for given language X?
> > > >     > >>> I wouldn't want to answer that question right now.
> > > >     > >>>
> > > >     > >>> On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler
> > > >     <uwe@thetaphi.de <ma...@thetaphi.de>
> > > >     > >>> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
> > > >     > >>>
> > > >     > >>> We tried out: Character.getType() for these two chars:
> > > >     > >>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> Java 5:
> > > >     > >>> '\u00AD' = 16
> > > >     > >>> '\u06DD' = 16
> > > >     > >>>
> > > >     > >>> Java 1.4:
> > > >     > >>> '\u00AD' = 20
> > > >     > >>> '\u06DD' = 7
> > > >     > >>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> The first is the soft hyphen.
> > > >     > >>>
> > > >     > >>> -----
> > > >     > >>> Uwe Schindler
> > > >     > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> > > >     > >>> http://www.thetaphi.de
> > > >     > >>> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> > > >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
> > > >     > >>>
> > > >     > >>>
> > > >     ------------------------------------------------------------------
> > --
> > > --
> > > >     > --
> > > >     > >>>
> > > >     > >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
> > > >     <ma...@gmail.com>
> > > >     > <mailto:rcmuir@gmail.com <ma...@gmail.com>>]
> > > >     > >>> *Sent:* Monday, November 16, 2009 8:37 PM
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> *To:* java-dev@lucene.apache.org
> > > >     <ma...@lucene.apache.org>
> > > >     <mailto:java-dev@lucene.apache.org
> > > >     <ma...@lucene.apache.org>>
> > > >     > >>> *Subject:* Re: Why release 3.0?
> > > >     > >>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> right, its nothing to do with lucene, instead due to
> > > >     property changes,
> > > >     > >>> etc.
> > > >     > >>>
> > > >     > >>> i just think we should inform users on java 1.4/2.9 that if
> > > they
> > > >     > >>> upgrade to java 1.5/3.0, they should reindex.
> > > >     > >>>
> > > >     > >>> the reason i say this about properties, is there are some
> > > >     that change
> > > >     > >>> that will affect tokenizers, i give two examples, a hyphen
> > > that
> > > >     > >>> changes from punctuation to format (might affect
> > > >     > >>>
> > > >     > >> SolrWordDelimiterFilter),
> > > >     > >>
> > > >     > >>> and arabic ayah which changes from NSM to format, which
> > > >     surely affects
> > > >     > >>> ArabicLetterTokenizer.
> > > >     > >>>
> > > >     > >>> On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe
> > > >     <sarowe@syr.edu <ma...@syr.edu>
> > > >     > >>> <mailto:sarowe@syr.edu <ma...@syr.edu>>> wrote:
> > > >     > >>>
> > > >     > >>> Hi Robert,
> > > >     > >>>
> > > >     > >>> I agree that the Unicode version supported by the JVM, as
> > > >     you say,
> > > >     > >>> really has nothing to do with Lucene.
> > > >     > >>>
> > > >     > >>> The disruption here is users' upgrading from Java 1.4 to
> > > >     1.5+, not
> > > >     > >>> when they upgrade Lucene.  I'd guess with few exceptions
> > > >     that most
> > > >     > >>> people have been using Lucene with 1.5+ for a couple of
> > > >     years now,
> > > >     > >>>
> > > >     > >> though.
> > > >     > >>
> > > >     > >>> But even the upgrade from Java 1.4 to 1.5+ will have (had)
> > > >     zero impact
> > > >     > >>> on most Lucene users, assuming that most use Latin-1
> > > >     exclusively;
> > > >     > >>> although I haven't looked, I'd be surprised if Latin-1
> > > >     characters
> > > >     > >>> changed much, if at all, from Unicode 3.0 to 4.0.
> > > >     > >>>
> > > >     > >>> It would be useful, I think, to include (a pointer to?) a
> > > >     description
> > > >     > >>> of the details of the Unicode 3.0->4.0 differences in the
> > > >     Lucene 3.0
> > > >     > >>> release notes, since the minimum required Java version, and
> > > >     so also
> > > >     > >>> the supported Unicode version, changes then.
> > > >     > >>>
> > > >     > >>> Steve
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> > > >     > >>>
> > > >     > >>>> the problem is that the properties have changed for various
> > > >     > >>>>
> > > >     > >> characters,
> > > >     > >>
> > > >     > >>>> and new characters were added.
> > > >     > >>>>
> > > >     > >>>> it really has nothing to do with lucene, but the idea you
> > > >     can go from
> > > >     > >>>> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing
> > > >     is not
> > > >     > >>>>
> > > >     > >> true.
> > > >     > >>
> > > >     > >>>> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler
> > > >     <uwe@thetaphi.de <ma...@thetaphi.de>
> > > >     > >>>>
> > > >     > >>> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
> > > >     > >>>
> > > >     > >>>>       But an UTF-8 stream from Java 4 can still be read
> > > >     with Java 5,
> > > >     > >>>> what is the problem? Java 5 extended Unicode support, but
> > > >     an index
> > > >     > >>>> created with older versions can still be read. UTF-8 is
> > > >     standardized.
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>>       -----
> > > >     > >>>>       Uwe Schindler
> > > >     > >>>>       H.-H.-Meier-Allee 63, D-28213 Bremen
> > > >     > >>>>       http://www.thetaphi.de
> > > >     > >>>>       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> > > >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>> ________________________________
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>>       From: Robert Muir [mailto:rcmuir@gmail.com
> > > >     <ma...@gmail.com>
> > > >     > >>>>
> > > >     > >>> <mailto:rcmuir@gmail.com <ma...@gmail.com>>]
> > > >     > >>>
> > > >     > >>>>       Sent: Monday, November 16, 2009 8:09 PM
> > > >     > >>>>
> > > >     > >>>>       To: java-dev@lucene.apache.org
> > > >     <ma...@lucene.apache.org> <mailto:java- <mailto:java->
> > > >     > >>>>
> > > >     > >> dev@lucene.apache.org <ma...@lucene.apache.org>>
> > > >     > >>
> > > >     > >>>>       Subject: Re: Why release 3.0?
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>>       uwe, on topic please read my comment on LUCENE-1689,
> > > >     because
> > > >     > >>>> unicode version was bumped in jdk 1.5, i believe this index
> > > >     backwards
> > > >     > >>>> compatibility is only theoretical
> > > >     > >>>>
> > > >     > >>>>       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler
> > > >     <uwe@thetaphi.de <ma...@thetaphi.de>
> > > >     > >>>>
> > > >     > >>> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
> > > >     > >>>
> > > >     > >>>>       2.9 has *not* the same format as 3.0, an index
> > > >     created with 3.0
> > > >     > >>>> cannot be read with 2.9. This is because compressed field
> > > >     support was
> > > >     > >>>> removed and therefore the version number of the stored
> > > >     fields file
> > > >     > was
> > > >     > >>>> upgraded. But indexes from 2.9 can be read with 3.0 and
> > > >     support may
> > > >     > >>>>
> > > >     > >> get
> > > >     > >>
> > > >     > >>>> removed in 4.0. 3.0 Indexes can be read until version 4.9.
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>>       Uwe
> > > >     > >>>>
> > > >     > >>>>       -----
> > > >     > >>>>       Uwe Schindler
> > > >     > >>>>       H.-H.-Meier-Allee 63, D-28213 Bremen
> > > >     > >>>>       http://www.thetaphi.de
> > > >     > >>>>       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> > > >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>> ________________________________
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>>       From: Jake Mannix [mailto:jake.mannix@gmail.com
> > > >     <ma...@gmail.com>
> > > >     > >>>>
> > > >     > >>> <mailto:jake.mannix@gmail.com
> > <ma...@gmail.com>>]
> > > >     > >>>
> > > >     > >>>>       Sent: Monday, November 16, 2009 7:15 PM
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>>       To: java-dev@lucene.apache.org
> > > >     <ma...@lucene.apache.org> <mailto:java- <mailto:java->
> > > >     > >>>>
> > > >     > >> dev@lucene.apache.org <ma...@lucene.apache.org>>
> > > >     > >>
> > > >     > >>>>       Subject: Re: Why release 3.0?
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>>       Don't users need to upgrade to 3.0 because 3.1 won't
> > be
> > > >     > >>>> necessarily able to read your
> > > >     > >>>>       2.4 index file formats?  I suppose if you've already
> > > >     upgraded
> > > >     > to
> > > >     > >>>> 2.9, then all is well because
> > > >     > >>>>       2.9 is the same format as 3.0, but we can't assume
> > > >     all users
> > > >     > >>>> upgraded from 2.4 to 2.9.
> > > >     > >>>>
> > > >     > >>>>       If you've done that already, then 3.0 might not be
> > > >     necessary,
> > > >     > >>>> but if you're on 2.4 right now,
> > > >     > >>>>       you will be in for a bad surprise if you try to
> > > >     upgrade to 3.1.
> > > >     > >>>>
> > > >     > >>>>         -jake
> > > >     > >>>>
> > > >     > >>>>       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> > > >     > >>>> <erickerickson@gmail.com <ma...@gmail.com>
> > > >     <mailto:erickerickson@gmail.com <ma...@gmail.com>>>
> > > >     wrote:
> > > >     > >>>>
> > > >     > >>>>       One of my "specialties" is asking obvious questions
> > > >     just to see
> > > >     > >>>> if everyone's assumptions are aligned. So with the
> > > >     discussion about
> > > >     > >>>> branching 3.0 I have to ask "Is there going to be any 3.0
> > > >     release
> > > >     > >>>> intended for *production*?". And if not, would we save a
> > lot
> > > of
> > > >     > >>>> work by just not worrying about retrofitting fixes to a 3.0
> > > >     branch
> > > >     > >>>> and carrying on with 3.1 as the first *supported* 3.x
> > > release?
> > > >     > >>>>
> > > >     > >>>>       Since 3.0 is "upgrade-to-java5 and remove
> > > >     deprecations", I'm
> > > >     > not
> > > >     > >>>> sure *as a user* I see a good reason to upgrade to 3.0.
> > > >     Getting a
> > > >     > >>>> "beta/snapshot" release to get a head start on cleaning up
> > > >     my code
> > > >     > >>>> does seem worthwhile, if I have the spare time. And having
> > > >     a base
> > > >     > >>>> 3.0 version that's not changing all over the place would be
> > > >     useful
> > > >     > >>>> for that.
> > > >     > >>>>
> > > >     > >>>>       That said, I'm also not terribly comfortable with a
> > > >     "release"
> > > >     > >>>> that's out there and unsupported.
> > > >     > >>>>
> > > >     > >>>>       Apologies if this has already been discussed, but I
> > > don't
> > > >     > >>>> remember it. Although my memory isn't what it used to be
> > (but
> > > >     > >>>> some would claim it never was<G>)...
> > > >     > >>>>
> > > >     > >>>>       Erick
> > > >     > >>>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> --
> > > >     > >>> Robert Muir
> > > >     > >>> rcmuir@gmail.com <ma...@gmail.com>
> > > >     <mailto:rcmuir@gmail.com <ma...@gmail.com>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> --
> > > >     > >>> Robert Muir
> > > >     > >>> rcmuir@gmail.com <ma...@gmail.com>
> > > >     <mailto:rcmuir@gmail.com <ma...@gmail.com>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> --
> > > >     > >>> Robert Muir
> > > >     > >>> rcmuir@gmail.com <ma...@gmail.com>
> > > >     <mailto:rcmuir@gmail.com <ma...@gmail.com>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >> --
> > > >     > >> - Mark
> > > >     > >>
> > > >     > >> http://www.lucidimagination.com
> > > >     > >>
> > > >     > >>
> > > >     > >>
> > > >     > >>
> > > >     > >>
> > > >     ------------------------------------------------------------------
> > --
> > > -
> > > >     > >> To unsubscribe, e-mail:
> > > >     java-dev-unsubscribe@lucene.apache.org
> > > >     <ma...@lucene.apache.org>
> > > >     > >> For additional commands, e-mail:
> > > >     java-dev-help@lucene.apache.org
> > > >     <ma...@lucene.apache.org>
> > > >     > >>
> > > >     > >
> > > >     > >
> > > >     > >
> > > >     > >
> > > >     ------------------------------------------------------------------
> > --
> > > -
> > > >     > > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > > >     <ma...@lucene.apache.org>
> > > >     > > For additional commands, e-mail:
> > > >     java-dev-help@lucene.apache.org
> > > >     <ma...@lucene.apache.org>
> > > >     > >
> > > >     > >
> > > >     >
> > > >     >
> > > >     > --
> > > >     > - Mark
> > > >     >
> > > >     > http://www.lucidimagination.com
> > > >     >
> > > >     >
> > > >     >
> > > >     >
> > > >     >
> > > >     ------------------------------------------------------------------
> > --
> > > -
> > > >     > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > > >     <ma...@lucene.apache.org>
> > > >     > For additional commands, e-mail: java-dev-help@lucene.apache.org
> > > >     <ma...@lucene.apache.org>
> > > >
> > > >
> > > >
> > > >     ------------------------------------------------------------------
> > --
> > > -
> > > >     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > > >     <ma...@lucene.apache.org>
> > > >     For additional commands, e-mail: java-dev-help@lucene.apache.org
> > > >     <ma...@lucene.apache.org>
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Robert Muir
> > > > rcmuir@gmail.com <ma...@gmail.com>
> > >
> > >
> > > --
> > > - Mark
> > >
> > > http://www.lucidimagination.com
> > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 
> 
> -- 
> Robert Muir
> rcmuir@gmail.com

Re: Why release 3.0?

Posted by Robert Muir <rc...@gmail.com>.

this fixes the standardTokenizer (thanks!!), but thats different, because
its not dependent on the end users JVM.

i think we still need a warning to users, Mark opened an issue about it,
because other tokenizers are dependent on the end users JVM, and we are
forcing them to upgrade to 1.5

On Mon, Nov 16, 2009 at 4:33 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

> I opened https://issues.apache.org/jira/browse/LUCENE-2074
>
> It fixes the problem, the patch uses a different impl depending on
> matchVersion.
>
> If I commit it now, I would regenerate the rc1 artifacts and release the
> tomorrow to java-user. Currently the ones on people.apache.org are only
> "known" to java-dev users.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Uwe Schindler [mailto:uwe@thetaphi.de]
> > Sent: Monday, November 16, 2009 9:59 PM
> > To: java-dev@lucene.apache.org
> > Subject: RE: Why release 3.0?
> >
> > OK, I checked. The JFLEX file in tunk was 1.4 generated. I regenerated
> > with
> > 1.5 and it was different (completely!). I saved the old version and
> > renamed
> > to StandardTokenizerImplJava14 extends StandardTokenizerImpl
> >
> > By this the impl is exchanged depending on version. The 1.4 version can
> no
> > longer be regenerated because it has no .jflex file and should really
> > never
> > be regenerated.
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> > > -----Original Message-----
> > > From: Mark Miller [mailto:markrmiller@gmail.com]
> > > Sent: Monday, November 16, 2009 9:45 PM
> > > To: java-dev@lucene.apache.org
> > > Subject: Re: Why release 3.0?
> > >
> > > I still reccomend we add a file then HowToRegenJflex.txt or something -
> > > that specifically says to use 1.5 or 1.6. I don't changing the current
> > > notice/warning is visible enough to ensure someone doesn't break this.
> > >
> > > Robert Muir wrote:
> > > > no. its still 4.0, but i hear 1.7 will be 5.1 or 5.2
> > > >
> > > > the only way to truly control this, would be to use something like
> ICU
> > > > to control the unicode version being used (and actually be faster,
> and
> > > > support higher version).
> > > > see http://site.icu-project.org/home/why-use-icu4j
> > > >
> > > > the issue is that lucene does not have 3rd party library
> dependencies,
> > > > on the other hand, i think tika and/or nutch already incorporate icu
> > > > for charset detection.
> > > >
> > > > i won't argue for this really, i know nobody wants it, but you can
> see
> > > > how the situation of not being able to control unicode semantics is
> > > > really difficult for a search engine.
> > > >
> > > > On Mon, Nov 16, 2009 at 3:33 PM, Uwe Schindler <
> uschindler@pangaea.de
> > > > <ma...@pangaea.de>> wrote:
> > > >
> > > >     Did 1.6 change the unicode version? Robert?
> > > >
> > > >     -----
> > > >     UWE SCHINDLER
> > > >     Webserver/Middleware Development
> > > >     PANGAEA - Publishing Network for Geoscientific and Environmental
> > > Data
> > > >     MARUM - University of Bremen
> > > >     Room 2500, Leobener Str., D-28359 Bremen
> > > >     Tel.: +49 421 218 65595
> > > >     Fax:  +49 421 218 65505
> > > >     http://www.pangaea.de/
> > > >     E-mail <http://www.pangaea.de/%0AE-mail>: uschindler@pangaea.de
> > > >     <ma...@pangaea.de>
> > > >
> > > >     > -----Original Message-----
> > > >     > From: Mark Miller [mailto:markrmiller@gmail.com
> > > >     <ma...@gmail.com>]
> > > >     > Sent: Monday, November 16, 2009 9:30 PM
> > > >     > To: java-dev@lucene.apache.org <mailto:java-
> > dev@lucene.apache.org>
> > > >     > Subject: Re: Why release 3.0?
> > > >     >
> > > >     > And what happens when someone regenerates it with 1.6 without
> > > >     knowing?
> > > >     >
> > > >     > Uwe Schindler wrote:
> > > >     > > I check this by generating the file with 1.4 and 1.5. The 1.4
> > > >     version
> > > >     > will
> > > >     > > not change anymore, so we just leave the java file no jflex
> > > >     anymore. The
> > > >     > old
> > > >     > > one is used for Lucene until 2.9, if you use
> > > >     matchVersion=LUCENE_30, the
> > > >     > new
> > > >     > > one is used, which can also be regenerated.
> > > >     > >
> > > >     > > -----
> > > >     > > Uwe Schindler
> > > >     > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > >     > > http://www.thetaphi.de
> > > >     > > eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> > > >     > >
> > > >     > >
> > > >     > >> -----Original Message-----
> > > >     > >> From: Mark Miller [mailto:markrmiller@gmail.com
> > > >     <ma...@gmail.com>]
> > > >     > >> Sent: Monday, November 16, 2009 9:21 PM
> > > >     > >> To: java-dev@lucene.apache.org
> > > >     <ma...@lucene.apache.org>
> > > >     > >> Subject: Re: Why release 3.0?
> > > >     > >>
> > > >     > >> Good point - and that likely means the current warning is
> not
> > > >     working -
> > > >     > >> what can we do to improve it?
> > > >     > >>
> > > >     > >> Perhaps a new text file called jflexregen or something, and
> > it
> > > >     > >> specifically says you must use java 1.5?
> > > >     > >>
> > > >     > >> Uwe Schindler wrote:
> > > >     > >>
> > > >     > >>> I think the regenerated code in Standard is since years no
> > > >     longer
> > > >     > >>> generated with 1.4 J Most developers use 1.5 or even 1.6.
> So
> > > it
> > > >     > >>> already changed incompatible.
> > > >     > >>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> -----
> > > >     > >>> Uwe Schindler
> > > >     > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> > > >     > >>> http://www.thetaphi.de
> > > >     > >>> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> > > >     > >>>
> > > >     > >>>
> > > >
> ------------------------------------------------------------------
> > --
> > > --
> > > >     > --
> > > >     > >>>
> > > >     > >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
> > > >     <ma...@gmail.com>]
> > > >     > >>> *Sent:* Monday, November 16, 2009 8:52 PM
> > > >     > >>> *To:* java-dev@lucene.apache.org
> > > >     <ma...@lucene.apache.org>
> > > >     > >>> *Subject:* Re: Why release 3.0?
> > > >     > >>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> Uwe, thats probably a good solution I think. just as long
> as
> > > we
> > > >     > >>> document somewhere,
> > > >     > >>> I think there is some warning verbage in StandardTokenizer
> > > >     already
> > > >     > >>> about this.
> > > >     > >>>
> > > >     > >>> NOTE: if you change StandardTokenizerImpl.jflex and need to
> > > >     regenerate
> > > >     > >>>       the tokenizer, remember to use JRE 1.4 to run jflex
> > > >     (before
> > > >     > >>>       Lucene 3.0).  This grammar now uses constructs (eg
> > > >     :digit:,
> > > >     > >>>       :letter:) whose meaning can vary according to the JRE
> > > >     used to
> > > >     > >>>       run jflex.  See
> > > >     > >>>       https://issues.apache.org/jira/browse/LUCENE-1126for
> > > >     details.
> > > >     > >>>
> > > >     > >>> On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler
> > > >     <uwe@thetaphi.de <ma...@thetaphi.de>
> > > >     > >>> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
> > > >     > >>>
> > > >     > >>> But it is a general warning that should be placed in the
> > > >     Wiki: If you
> > > >     > >>> upgrade from Java 1.4 to Java 5, think about reindexing.
> > > >     > >>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> It has definitely nothing to do with 3.0, because uses
> could
> > > >     have
> > > >     > >>> changed (and most of them have) before.
> > > >     > >>>
> > > >     > >>> -----
> > > >     > >>> Uwe Schindler
> > > >     > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> > > >     > >>> http://www.thetaphi.de
> > > >     > >>> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> > > >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
> > > >     > >>>
> > > >     > >>>
> > > >
> ------------------------------------------------------------------
> > --
> > > --
> > > >     > --
> > > >     > >>>
> > > >     > >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
> > > >     <ma...@gmail.com>
> > > >     > <mailto:rcmuir@gmail.com <ma...@gmail.com>>]
> > > >     > >>> *Sent:* Monday, November 16, 2009 8:45 PM
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> *To:* java-dev@lucene.apache.org
> > > >     <ma...@lucene.apache.org>
> > > >     <mailto:java-dev@lucene.apache.org
> > > >     <ma...@lucene.apache.org>>
> > > >     > >>> *Subject:* Re: Why release 3.0?
> > > >     > >>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> right, my point is its true its nothing to do with Lucene
> at
> > > >     all,
> > > >     > >>>
> > > >     > >> really.
> > > >     > >>
> > > >     > >>> but the reality is we should clarify this to users I think.
> > > >     > >>>
> > > >     > >>> Its especially complex in the current StandardTokenizer,
> > > >     which uses a
> > > >     > >>> mix of hardcoded ranges and properties, can you tell me if
> > > >     you should
> > > >     > >>> reindex for given language X?
> > > >     > >>> I wouldn't want to answer that question right now.
> > > >     > >>>
> > > >     > >>> On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler
> > > >     <uwe@thetaphi.de <ma...@thetaphi.de>
> > > >     > >>> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
> > > >     > >>>
> > > >     > >>> We tried out: Character.getType() for these two chars:
> > > >     > >>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> Java 5:
> > > >     > >>> '\u00AD' = 16
> > > >     > >>> '\u06DD' = 16
> > > >     > >>>
> > > >     > >>> Java 1.4:
> > > >     > >>> '\u00AD' = 20
> > > >     > >>> '\u06DD' = 7
> > > >     > >>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> The first is the soft hyphen.
> > > >     > >>>
> > > >     > >>> -----
> > > >     > >>> Uwe Schindler
> > > >     > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> > > >     > >>> http://www.thetaphi.de
> > > >     > >>> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> > > >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
> > > >     > >>>
> > > >     > >>>
> > > >
> ------------------------------------------------------------------
> > --
> > > --
> > > >     > --
> > > >     > >>>
> > > >     > >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
> > > >     <ma...@gmail.com>
> > > >     > <mailto:rcmuir@gmail.com <ma...@gmail.com>>]
> > > >     > >>> *Sent:* Monday, November 16, 2009 8:37 PM
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> *To:* java-dev@lucene.apache.org
> > > >     <ma...@lucene.apache.org>
> > > >     <mailto:java-dev@lucene.apache.org
> > > >     <ma...@lucene.apache.org>>
> > > >     > >>> *Subject:* Re: Why release 3.0?
> > > >     > >>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> right, its nothing to do with lucene, instead due to
> > > >     property changes,
> > > >     > >>> etc.
> > > >     > >>>
> > > >     > >>> i just think we should inform users on java 1.4/2.9 that if
> > > they
> > > >     > >>> upgrade to java 1.5/3.0, they should reindex.
> > > >     > >>>
> > > >     > >>> the reason i say this about properties, is there are some
> > > >     that change
> > > >     > >>> that will affect tokenizers, i give two examples, a hyphen
> > > that
> > > >     > >>> changes from punctuation to format (might affect
> > > >     > >>>
> > > >     > >> SolrWordDelimiterFilter),
> > > >     > >>
> > > >     > >>> and arabic ayah which changes from NSM to format, which
> > > >     surely affects
> > > >     > >>> ArabicLetterTokenizer.
> > > >     > >>>
> > > >     > >>> On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe
> > > >     <sarowe@syr.edu <ma...@syr.edu>
> > > >     > >>> <mailto:sarowe@syr.edu <ma...@syr.edu>>> wrote:
> > > >     > >>>
> > > >     > >>> Hi Robert,
> > > >     > >>>
> > > >     > >>> I agree that the Unicode version supported by the JVM, as
> > > >     you say,
> > > >     > >>> really has nothing to do with Lucene.
> > > >     > >>>
> > > >     > >>> The disruption here is users' upgrading from Java 1.4 to
> > > >     1.5+, not
> > > >     > >>> when they upgrade Lucene.  I'd guess with few exceptions
> > > >     that most
> > > >     > >>> people have been using Lucene with 1.5+ for a couple of
> > > >     years now,
> > > >     > >>>
> > > >     > >> though.
> > > >     > >>
> > > >     > >>> But even the upgrade from Java 1.4 to 1.5+ will have (had)
> > > >     zero impact
> > > >     > >>> on most Lucene users, assuming that most use Latin-1
> > > >     exclusively;
> > > >     > >>> although I haven't looked, I'd be surprised if Latin-1
> > > >     characters
> > > >     > >>> changed much, if at all, from Unicode 3.0 to 4.0.
> > > >     > >>>
> > > >     > >>> It would be useful, I think, to include (a pointer to?) a
> > > >     description
> > > >     > >>> of the details of the Unicode 3.0->4.0 differences in the
> > > >     Lucene 3.0
> > > >     > >>> release notes, since the minimum required Java version, and
> > > >     so also
> > > >     > >>> the supported Unicode version, changes then.
> > > >     > >>>
> > > >     > >>> Steve
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> > > >     > >>>
> > > >     > >>>> the problem is that the properties have changed for
> various
> > > >     > >>>>
> > > >     > >> characters,
> > > >     > >>
> > > >     > >>>> and new characters were added.
> > > >     > >>>>
> > > >     > >>>> it really has nothing to do with lucene, but the idea you
> > > >     can go from
> > > >     > >>>> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing
> > > >     is not
> > > >     > >>>>
> > > >     > >> true.
> > > >     > >>
> > > >     > >>>> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler
> > > >     <uwe@thetaphi.de <ma...@thetaphi.de>
> > > >     > >>>>
> > > >     > >>> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
> > > >     > >>>
> > > >     > >>>>       But an UTF-8 stream from Java 4 can still be read
> > > >     with Java 5,
> > > >     > >>>> what is the problem? Java 5 extended Unicode support, but
> > > >     an index
> > > >     > >>>> created with older versions can still be read. UTF-8 is
> > > >     standardized.
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>>       -----
> > > >     > >>>>       Uwe Schindler
> > > >     > >>>>       H.-H.-Meier-Allee 63, D-28213 Bremen
> > > >     > >>>>       http://www.thetaphi.de
> > > >     > >>>>       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> > > >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>> ________________________________
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>>       From: Robert Muir [mailto:rcmuir@gmail.com
> > > >     <ma...@gmail.com>
> > > >     > >>>>
> > > >     > >>> <mailto:rcmuir@gmail.com <ma...@gmail.com>>]
> > > >     > >>>
> > > >     > >>>>       Sent: Monday, November 16, 2009 8:09 PM
> > > >     > >>>>
> > > >     > >>>>       To: java-dev@lucene.apache.org
> > > >     <ma...@lucene.apache.org> <mailto:java- <mailto:java->
> > > >     > >>>>
> > > >     > >> dev@lucene.apache.org <ma...@lucene.apache.org>>
> > > >     > >>
> > > >     > >>>>       Subject: Re: Why release 3.0?
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>>       uwe, on topic please read my comment on LUCENE-1689,
> > > >     because
> > > >     > >>>> unicode version was bumped in jdk 1.5, i believe this
> index
> > > >     backwards
> > > >     > >>>> compatibility is only theoretical
> > > >     > >>>>
> > > >     > >>>>       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler
> > > >     <uwe@thetaphi.de <ma...@thetaphi.de>
> > > >     > >>>>
> > > >     > >>> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
> > > >     > >>>
> > > >     > >>>>       2.9 has *not* the same format as 3.0, an index
> > > >     created with 3.0
> > > >     > >>>> cannot be read with 2.9. This is because compressed field
> > > >     support was
> > > >     > >>>> removed and therefore the version number of the stored
> > > >     fields file
> > > >     > was
> > > >     > >>>> upgraded. But indexes from 2.9 can be read with 3.0 and
> > > >     support may
> > > >     > >>>>
> > > >     > >> get
> > > >     > >>
> > > >     > >>>> removed in 4.0. 3.0 Indexes can be read until version 4.9.
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>>       Uwe
> > > >     > >>>>
> > > >     > >>>>       -----
> > > >     > >>>>       Uwe Schindler
> > > >     > >>>>       H.-H.-Meier-Allee 63, D-28213 Bremen
> > > >     > >>>>       http://www.thetaphi.de
> > > >     > >>>>       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> > > >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>> ________________________________
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>>       From: Jake Mannix [mailto:jake.mannix@gmail.com
> > > >     <ma...@gmail.com>
> > > >     > >>>>
> > > >     > >>> <mailto:jake.mannix@gmail.com
> > <ma...@gmail.com>>]
> > > >     > >>>
> > > >     > >>>>       Sent: Monday, November 16, 2009 7:15 PM
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>>       To: java-dev@lucene.apache.org
> > > >     <ma...@lucene.apache.org> <mailto:java- <mailto:java->
> > > >     > >>>>
> > > >     > >> dev@lucene.apache.org <ma...@lucene.apache.org>>
> > > >     > >>
> > > >     > >>>>       Subject: Re: Why release 3.0?
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>>
> > > >     > >>>>       Don't users need to upgrade to 3.0 because 3.1 won't
> > be
> > > >     > >>>> necessarily able to read your
> > > >     > >>>>       2.4 index file formats?  I suppose if you've already
> > > >     upgraded
> > > >     > to
> > > >     > >>>> 2.9, then all is well because
> > > >     > >>>>       2.9 is the same format as 3.0, but we can't assume
> > > >     all users
> > > >     > >>>> upgraded from 2.4 to 2.9.
> > > >     > >>>>
> > > >     > >>>>       If you've done that already, then 3.0 might not be
> > > >     necessary,
> > > >     > >>>> but if you're on 2.4 right now,
> > > >     > >>>>       you will be in for a bad surprise if you try to
> > > >     upgrade to 3.1.
> > > >     > >>>>
> > > >     > >>>>         -jake
> > > >     > >>>>
> > > >     > >>>>       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> > > >     > >>>> <erickerickson@gmail.com <ma...@gmail.com>
> > > >     <mailto:erickerickson@gmail.com <mailto:erickerickson@gmail.com
> >>>
> > > >     wrote:
> > > >     > >>>>
> > > >     > >>>>       One of my "specialties" is asking obvious questions
> > > >     just to see
> > > >     > >>>> if everyone's assumptions are aligned. So with the
> > > >     discussion about
> > > >     > >>>> branching 3.0 I have to ask "Is there going to be any 3.0
> > > >     release
> > > >     > >>>> intended for *production*?". And if not, would we save a
> > lot
> > > of
> > > >     > >>>> work by just not worrying about retrofitting fixes to a
> 3.0
> > > >     branch
> > > >     > >>>> and carrying on with 3.1 as the first *supported* 3.x
> > > release?
> > > >     > >>>>
> > > >     > >>>>       Since 3.0 is "upgrade-to-java5 and remove
> > > >     deprecations", I'm
> > > >     > not
> > > >     > >>>> sure *as a user* I see a good reason to upgrade to 3.0.
> > > >     Getting a
> > > >     > >>>> "beta/snapshot" release to get a head start on cleaning up
> > > >     my code
> > > >     > >>>> does seem worthwhile, if I have the spare time. And having
> > > >     a base
> > > >     > >>>> 3.0 version that's not changing all over the place would
> be
> > > >     useful
> > > >     > >>>> for that.
> > > >     > >>>>
> > > >     > >>>>       That said, I'm also not terribly comfortable with a
> > > >     "release"
> > > >     > >>>> that's out there and unsupported.
> > > >     > >>>>
> > > >     > >>>>       Apologies if this has already been discussed, but I
> > > don't
> > > >     > >>>> remember it. Although my memory isn't what it used to be
> > (but
> > > >     > >>>> some would claim it never was<G>)...
> > > >     > >>>>
> > > >     > >>>>       Erick
> > > >     > >>>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> --
> > > >     > >>> Robert Muir
> > > >     > >>> rcmuir@gmail.com <ma...@gmail.com>
> > > >     <mailto:rcmuir@gmail.com <ma...@gmail.com>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> --
> > > >     > >>> Robert Muir
> > > >     > >>> rcmuir@gmail.com <ma...@gmail.com>
> > > >     <mailto:rcmuir@gmail.com <ma...@gmail.com>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >>> --
> > > >     > >>> Robert Muir
> > > >     > >>> rcmuir@gmail.com <ma...@gmail.com>
> > > >     <mailto:rcmuir@gmail.com <ma...@gmail.com>>
> > > >     > >>>
> > > >     > >>>
> > > >     > >> --
> > > >     > >> - Mark
> > > >     > >>
> > > >     > >> http://www.lucidimagination.com
> > > >     > >>
> > > >     > >>
> > > >     > >>
> > > >     > >>
> > > >     > >>
> > > >
> ------------------------------------------------------------------
> > --
> > > -
> > > >     > >> To unsubscribe, e-mail:
> > > >     java-dev-unsubscribe@lucene.apache.org
> > > >     <ma...@lucene.apache.org>
> > > >     > >> For additional commands, e-mail:
> > > >     java-dev-help@lucene.apache.org
> > > >     <ma...@lucene.apache.org>
> > > >     > >>
> > > >     > >
> > > >     > >
> > > >     > >
> > > >     > >
> > > >
> ------------------------------------------------------------------
> > --
> > > -
> > > >     > > To unsubscribe, e-mail:
> java-dev-unsubscribe@lucene.apache.org
> > > >     <ma...@lucene.apache.org>
> > > >     > > For additional commands, e-mail:
> > > >     java-dev-help@lucene.apache.org
> > > >     <ma...@lucene.apache.org>
> > > >     > >
> > > >     > >
> > > >     >
> > > >     >
> > > >     > --
> > > >     > - Mark
> > > >     >
> > > >     > http://www.lucidimagination.com
> > > >     >
> > > >     >
> > > >     >
> > > >     >
> > > >     >
> > > >
> ------------------------------------------------------------------
> > --
> > > -
> > > >     > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > > >     <ma...@lucene.apache.org>
> > > >     > For additional commands, e-mail:
> java-dev-help@lucene.apache.org
> > > >     <ma...@lucene.apache.org>
> > > >
> > > >
> > > >
> > > >
> ------------------------------------------------------------------
> > --
> > > -
> > > >     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > > >     <ma...@lucene.apache.org>
> > > >     For additional commands, e-mail: java-dev-help@lucene.apache.org
> > > >     <ma...@lucene.apache.org>
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Robert Muir
> > > > rcmuir@gmail.com <ma...@gmail.com>
> > >
> > >
> > > --
> > > - Mark
> > >
> > > http://www.lucidimagination.com
> > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Robert Muir
rcmuir@gmail.com

RE: Why release 3.0?

Posted by Uwe Schindler <uw...@thetaphi.de>.

I opened https://issues.apache.org/jira/browse/LUCENE-2074

It fixes the problem, the patch uses a different impl depending on
matchVersion.

If I commit it now, I would regenerate the rc1 artifacts and release the
tomorrow to java-user. Currently the ones on people.apache.org are only
"known" to java-dev users.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> Sent: Monday, November 16, 2009 9:59 PM
> To: java-dev@lucene.apache.org
> Subject: RE: Why release 3.0?
> 
> OK, I checked. The JFLEX file in tunk was 1.4 generated. I regenerated
> with
> 1.5 and it was different (completely!). I saved the old version and
> renamed
> to StandardTokenizerImplJava14 extends StandardTokenizerImpl
> 
> By this the impl is exchanged depending on version. The 1.4 version can no
> longer be regenerated because it has no .jflex file and should really
> never
> be regenerated.
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> > -----Original Message-----
> > From: Mark Miller [mailto:markrmiller@gmail.com]
> > Sent: Monday, November 16, 2009 9:45 PM
> > To: java-dev@lucene.apache.org
> > Subject: Re: Why release 3.0?
> >
> > I still reccomend we add a file then HowToRegenJflex.txt or something -
> > that specifically says to use 1.5 or 1.6. I don't changing the current
> > notice/warning is visible enough to ensure someone doesn't break this.
> >
> > Robert Muir wrote:
> > > no. its still 4.0, but i hear 1.7 will be 5.1 or 5.2
> > >
> > > the only way to truly control this, would be to use something like ICU
> > > to control the unicode version being used (and actually be faster, and
> > > support higher version).
> > > see http://site.icu-project.org/home/why-use-icu4j
> > >
> > > the issue is that lucene does not have 3rd party library dependencies,
> > > on the other hand, i think tika and/or nutch already incorporate icu
> > > for charset detection.
> > >
> > > i won't argue for this really, i know nobody wants it, but you can see
> > > how the situation of not being able to control unicode semantics is
> > > really difficult for a search engine.
> > >
> > > On Mon, Nov 16, 2009 at 3:33 PM, Uwe Schindler <uschindler@pangaea.de
> > > <ma...@pangaea.de>> wrote:
> > >
> > >     Did 1.6 change the unicode version? Robert?
> > >
> > >     -----
> > >     UWE SCHINDLER
> > >     Webserver/Middleware Development
> > >     PANGAEA - Publishing Network for Geoscientific and Environmental
> > Data
> > >     MARUM - University of Bremen
> > >     Room 2500, Leobener Str., D-28359 Bremen
> > >     Tel.: +49 421 218 65595
> > >     Fax:  +49 421 218 65505
> > >     http://www.pangaea.de/
> > >     E-mail <http://www.pangaea.de/%0AE-mail>: uschindler@pangaea.de
> > >     <ma...@pangaea.de>
> > >
> > >     > -----Original Message-----
> > >     > From: Mark Miller [mailto:markrmiller@gmail.com
> > >     <ma...@gmail.com>]
> > >     > Sent: Monday, November 16, 2009 9:30 PM
> > >     > To: java-dev@lucene.apache.org <mailto:java-
> dev@lucene.apache.org>
> > >     > Subject: Re: Why release 3.0?
> > >     >
> > >     > And what happens when someone regenerates it with 1.6 without
> > >     knowing?
> > >     >
> > >     > Uwe Schindler wrote:
> > >     > > I check this by generating the file with 1.4 and 1.5. The 1.4
> > >     version
> > >     > will
> > >     > > not change anymore, so we just leave the java file no jflex
> > >     anymore. The
> > >     > old
> > >     > > one is used for Lucene until 2.9, if you use
> > >     matchVersion=LUCENE_30, the
> > >     > new
> > >     > > one is used, which can also be regenerated.
> > >     > >
> > >     > > -----
> > >     > > Uwe Schindler
> > >     > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > >     > > http://www.thetaphi.de
> > >     > > eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> > >     > >
> > >     > >
> > >     > >> -----Original Message-----
> > >     > >> From: Mark Miller [mailto:markrmiller@gmail.com
> > >     <ma...@gmail.com>]
> > >     > >> Sent: Monday, November 16, 2009 9:21 PM
> > >     > >> To: java-dev@lucene.apache.org
> > >     <ma...@lucene.apache.org>
> > >     > >> Subject: Re: Why release 3.0?
> > >     > >>
> > >     > >> Good point - and that likely means the current warning is not
> > >     working -
> > >     > >> what can we do to improve it?
> > >     > >>
> > >     > >> Perhaps a new text file called jflexregen or something, and
> it
> > >     > >> specifically says you must use java 1.5?
> > >     > >>
> > >     > >> Uwe Schindler wrote:
> > >     > >>
> > >     > >>> I think the regenerated code in Standard is since years no
> > >     longer
> > >     > >>> generated with 1.4 J Most developers use 1.5 or even 1.6. So
> > it
> > >     > >>> already changed incompatible.
> > >     > >>>
> > >     > >>>
> > >     > >>>
> > >     > >>> -----
> > >     > >>> Uwe Schindler
> > >     > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> > >     > >>> http://www.thetaphi.de
> > >     > >>> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> > >     > >>>
> > >     > >>>
> > >     ------------------------------------------------------------------
> --
> > --
> > >     > --
> > >     > >>>
> > >     > >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
> > >     <ma...@gmail.com>]
> > >     > >>> *Sent:* Monday, November 16, 2009 8:52 PM
> > >     > >>> *To:* java-dev@lucene.apache.org
> > >     <ma...@lucene.apache.org>
> > >     > >>> *Subject:* Re: Why release 3.0?
> > >     > >>>
> > >     > >>>
> > >     > >>>
> > >     > >>> Uwe, thats probably a good solution I think. just as long as
> > we
> > >     > >>> document somewhere,
> > >     > >>> I think there is some warning verbage in StandardTokenizer
> > >     already
> > >     > >>> about this.
> > >     > >>>
> > >     > >>> NOTE: if you change StandardTokenizerImpl.jflex and need to
> > >     regenerate
> > >     > >>>       the tokenizer, remember to use JRE 1.4 to run jflex
> > >     (before
> > >     > >>>       Lucene 3.0).  This grammar now uses constructs (eg
> > >     :digit:,
> > >     > >>>       :letter:) whose meaning can vary according to the JRE
> > >     used to
> > >     > >>>       run jflex.  See
> > >     > >>>       https://issues.apache.org/jira/browse/LUCENE-1126 for
> > >     details.
> > >     > >>>
> > >     > >>> On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler
> > >     <uwe@thetaphi.de <ma...@thetaphi.de>
> > >     > >>> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
> > >     > >>>
> > >     > >>> But it is a general warning that should be placed in the
> > >     Wiki: If you
> > >     > >>> upgrade from Java 1.4 to Java 5, think about reindexing.
> > >     > >>>
> > >     > >>>
> > >     > >>>
> > >     > >>> It has definitely nothing to do with 3.0, because uses could
> > >     have
> > >     > >>> changed (and most of them have) before.
> > >     > >>>
> > >     > >>> -----
> > >     > >>> Uwe Schindler
> > >     > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> > >     > >>> http://www.thetaphi.de
> > >     > >>> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> > >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
> > >     > >>>
> > >     > >>>
> > >     ------------------------------------------------------------------
> --
> > --
> > >     > --
> > >     > >>>
> > >     > >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
> > >     <ma...@gmail.com>
> > >     > <mailto:rcmuir@gmail.com <ma...@gmail.com>>]
> > >     > >>> *Sent:* Monday, November 16, 2009 8:45 PM
> > >     > >>>
> > >     > >>>
> > >     > >>> *To:* java-dev@lucene.apache.org
> > >     <ma...@lucene.apache.org>
> > >     <mailto:java-dev@lucene.apache.org
> > >     <ma...@lucene.apache.org>>
> > >     > >>> *Subject:* Re: Why release 3.0?
> > >     > >>>
> > >     > >>>
> > >     > >>>
> > >     > >>> right, my point is its true its nothing to do with Lucene at
> > >     all,
> > >     > >>>
> > >     > >> really.
> > >     > >>
> > >     > >>> but the reality is we should clarify this to users I think.
> > >     > >>>
> > >     > >>> Its especially complex in the current StandardTokenizer,
> > >     which uses a
> > >     > >>> mix of hardcoded ranges and properties, can you tell me if
> > >     you should
> > >     > >>> reindex for given language X?
> > >     > >>> I wouldn't want to answer that question right now.
> > >     > >>>
> > >     > >>> On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler
> > >     <uwe@thetaphi.de <ma...@thetaphi.de>
> > >     > >>> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
> > >     > >>>
> > >     > >>> We tried out: Character.getType() for these two chars:
> > >     > >>>
> > >     > >>>
> > >     > >>>
> > >     > >>> Java 5:
> > >     > >>> '\u00AD' = 16
> > >     > >>> '\u06DD' = 16
> > >     > >>>
> > >     > >>> Java 1.4:
> > >     > >>> '\u00AD' = 20
> > >     > >>> '\u06DD' = 7
> > >     > >>>
> > >     > >>>
> > >     > >>>
> > >     > >>> The first is the soft hyphen.
> > >     > >>>
> > >     > >>> -----
> > >     > >>> Uwe Schindler
> > >     > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> > >     > >>> http://www.thetaphi.de
> > >     > >>> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> > >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
> > >     > >>>
> > >     > >>>
> > >     ------------------------------------------------------------------
> --
> > --
> > >     > --
> > >     > >>>
> > >     > >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
> > >     <ma...@gmail.com>
> > >     > <mailto:rcmuir@gmail.com <ma...@gmail.com>>]
> > >     > >>> *Sent:* Monday, November 16, 2009 8:37 PM
> > >     > >>>
> > >     > >>>
> > >     > >>> *To:* java-dev@lucene.apache.org
> > >     <ma...@lucene.apache.org>
> > >     <mailto:java-dev@lucene.apache.org
> > >     <ma...@lucene.apache.org>>
> > >     > >>> *Subject:* Re: Why release 3.0?
> > >     > >>>
> > >     > >>>
> > >     > >>>
> > >     > >>> right, its nothing to do with lucene, instead due to
> > >     property changes,
> > >     > >>> etc.
> > >     > >>>
> > >     > >>> i just think we should inform users on java 1.4/2.9 that if
> > they
> > >     > >>> upgrade to java 1.5/3.0, they should reindex.
> > >     > >>>
> > >     > >>> the reason i say this about properties, is there are some
> > >     that change
> > >     > >>> that will affect tokenizers, i give two examples, a hyphen
> > that
> > >     > >>> changes from punctuation to format (might affect
> > >     > >>>
> > >     > >> SolrWordDelimiterFilter),
> > >     > >>
> > >     > >>> and arabic ayah which changes from NSM to format, which
> > >     surely affects
> > >     > >>> ArabicLetterTokenizer.
> > >     > >>>
> > >     > >>> On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe
> > >     <sarowe@syr.edu <ma...@syr.edu>
> > >     > >>> <mailto:sarowe@syr.edu <ma...@syr.edu>>> wrote:
> > >     > >>>
> > >     > >>> Hi Robert,
> > >     > >>>
> > >     > >>> I agree that the Unicode version supported by the JVM, as
> > >     you say,
> > >     > >>> really has nothing to do with Lucene.
> > >     > >>>
> > >     > >>> The disruption here is users' upgrading from Java 1.4 to
> > >     1.5+, not
> > >     > >>> when they upgrade Lucene.  I'd guess with few exceptions
> > >     that most
> > >     > >>> people have been using Lucene with 1.5+ for a couple of
> > >     years now,
> > >     > >>>
> > >     > >> though.
> > >     > >>
> > >     > >>> But even the upgrade from Java 1.4 to 1.5+ will have (had)
> > >     zero impact
> > >     > >>> on most Lucene users, assuming that most use Latin-1
> > >     exclusively;
> > >     > >>> although I haven't looked, I'd be surprised if Latin-1
> > >     characters
> > >     > >>> changed much, if at all, from Unicode 3.0 to 4.0.
> > >     > >>>
> > >     > >>> It would be useful, I think, to include (a pointer to?) a
> > >     description
> > >     > >>> of the details of the Unicode 3.0->4.0 differences in the
> > >     Lucene 3.0
> > >     > >>> release notes, since the minimum required Java version, and
> > >     so also
> > >     > >>> the supported Unicode version, changes then.
> > >     > >>>
> > >     > >>> Steve
> > >     > >>>
> > >     > >>>
> > >     > >>> On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> > >     > >>>
> > >     > >>>> the problem is that the properties have changed for various
> > >     > >>>>
> > >     > >> characters,
> > >     > >>
> > >     > >>>> and new characters were added.
> > >     > >>>>
> > >     > >>>> it really has nothing to do with lucene, but the idea you
> > >     can go from
> > >     > >>>> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing
> > >     is not
> > >     > >>>>
> > >     > >> true.
> > >     > >>
> > >     > >>>> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler
> > >     <uwe@thetaphi.de <ma...@thetaphi.de>
> > >     > >>>>
> > >     > >>> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
> > >     > >>>
> > >     > >>>>       But an UTF-8 stream from Java 4 can still be read
> > >     with Java 5,
> > >     > >>>> what is the problem? Java 5 extended Unicode support, but
> > >     an index
> > >     > >>>> created with older versions can still be read. UTF-8 is
> > >     standardized.
> > >     > >>>>
> > >     > >>>>
> > >     > >>>>
> > >     > >>>>       -----
> > >     > >>>>       Uwe Schindler
> > >     > >>>>       H.-H.-Meier-Allee 63, D-28213 Bremen
> > >     > >>>>       http://www.thetaphi.de
> > >     > >>>>       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> > >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
> > >     > >>>>
> > >     > >>>>
> > >     > >>>> ________________________________
> > >     > >>>>
> > >     > >>>>
> > >     > >>>>       From: Robert Muir [mailto:rcmuir@gmail.com
> > >     <ma...@gmail.com>
> > >     > >>>>
> > >     > >>> <mailto:rcmuir@gmail.com <ma...@gmail.com>>]
> > >     > >>>
> > >     > >>>>       Sent: Monday, November 16, 2009 8:09 PM
> > >     > >>>>
> > >     > >>>>       To: java-dev@lucene.apache.org
> > >     <ma...@lucene.apache.org> <mailto:java- <mailto:java->
> > >     > >>>>
> > >     > >> dev@lucene.apache.org <ma...@lucene.apache.org>>
> > >     > >>
> > >     > >>>>       Subject: Re: Why release 3.0?
> > >     > >>>>
> > >     > >>>>
> > >     > >>>>
> > >     > >>>>       uwe, on topic please read my comment on LUCENE-1689,
> > >     because
> > >     > >>>> unicode version was bumped in jdk 1.5, i believe this index
> > >     backwards
> > >     > >>>> compatibility is only theoretical
> > >     > >>>>
> > >     > >>>>       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler
> > >     <uwe@thetaphi.de <ma...@thetaphi.de>
> > >     > >>>>
> > >     > >>> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
> > >     > >>>
> > >     > >>>>       2.9 has *not* the same format as 3.0, an index
> > >     created with 3.0
> > >     > >>>> cannot be read with 2.9. This is because compressed field
> > >     support was
> > >     > >>>> removed and therefore the version number of the stored
> > >     fields file
> > >     > was
> > >     > >>>> upgraded. But indexes from 2.9 can be read with 3.0 and
> > >     support may
> > >     > >>>>
> > >     > >> get
> > >     > >>
> > >     > >>>> removed in 4.0. 3.0 Indexes can be read until version 4.9.
> > >     > >>>>
> > >     > >>>>
> > >     > >>>>
> > >     > >>>>       Uwe
> > >     > >>>>
> > >     > >>>>       -----
> > >     > >>>>       Uwe Schindler
> > >     > >>>>       H.-H.-Meier-Allee 63, D-28213 Bremen
> > >     > >>>>       http://www.thetaphi.de
> > >     > >>>>       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> > >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
> > >     > >>>>
> > >     > >>>>
> > >     > >>>> ________________________________
> > >     > >>>>
> > >     > >>>>
> > >     > >>>>       From: Jake Mannix [mailto:jake.mannix@gmail.com
> > >     <ma...@gmail.com>
> > >     > >>>>
> > >     > >>> <mailto:jake.mannix@gmail.com
> <ma...@gmail.com>>]
> > >     > >>>
> > >     > >>>>       Sent: Monday, November 16, 2009 7:15 PM
> > >     > >>>>
> > >     > >>>>
> > >     > >>>>       To: java-dev@lucene.apache.org
> > >     <ma...@lucene.apache.org> <mailto:java- <mailto:java->
> > >     > >>>>
> > >     > >> dev@lucene.apache.org <ma...@lucene.apache.org>>
> > >     > >>
> > >     > >>>>       Subject: Re: Why release 3.0?
> > >     > >>>>
> > >     > >>>>
> > >     > >>>>
> > >     > >>>>       Don't users need to upgrade to 3.0 because 3.1 won't
> be
> > >     > >>>> necessarily able to read your
> > >     > >>>>       2.4 index file formats?  I suppose if you've already
> > >     upgraded
> > >     > to
> > >     > >>>> 2.9, then all is well because
> > >     > >>>>       2.9 is the same format as 3.0, but we can't assume
> > >     all users
> > >     > >>>> upgraded from 2.4 to 2.9.
> > >     > >>>>
> > >     > >>>>       If you've done that already, then 3.0 might not be
> > >     necessary,
> > >     > >>>> but if you're on 2.4 right now,
> > >     > >>>>       you will be in for a bad surprise if you try to
> > >     upgrade to 3.1.
> > >     > >>>>
> > >     > >>>>         -jake
> > >     > >>>>
> > >     > >>>>       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> > >     > >>>> <erickerickson@gmail.com <ma...@gmail.com>
> > >     <mailto:erickerickson@gmail.com <ma...@gmail.com>>>
> > >     wrote:
> > >     > >>>>
> > >     > >>>>       One of my "specialties" is asking obvious questions
> > >     just to see
> > >     > >>>> if everyone's assumptions are aligned. So with the
> > >     discussion about
> > >     > >>>> branching 3.0 I have to ask "Is there going to be any 3.0
> > >     release
> > >     > >>>> intended for *production*?". And if not, would we save a
> lot
> > of
> > >     > >>>> work by just not worrying about retrofitting fixes to a 3.0
> > >     branch
> > >     > >>>> and carrying on with 3.1 as the first *supported* 3.x
> > release?
> > >     > >>>>
> > >     > >>>>       Since 3.0 is "upgrade-to-java5 and remove
> > >     deprecations", I'm
> > >     > not
> > >     > >>>> sure *as a user* I see a good reason to upgrade to 3.0.
> > >     Getting a
> > >     > >>>> "beta/snapshot" release to get a head start on cleaning up
> > >     my code
> > >     > >>>> does seem worthwhile, if I have the spare time. And having
> > >     a base
> > >     > >>>> 3.0 version that's not changing all over the place would be
> > >     useful
> > >     > >>>> for that.
> > >     > >>>>
> > >     > >>>>       That said, I'm also not terribly comfortable with a
> > >     "release"
> > >     > >>>> that's out there and unsupported.
> > >     > >>>>
> > >     > >>>>       Apologies if this has already been discussed, but I
> > don't
> > >     > >>>> remember it. Although my memory isn't what it used to be
> (but
> > >     > >>>> some would claim it never was<G>)...
> > >     > >>>>
> > >     > >>>>       Erick
> > >     > >>>>
> > >     > >>>
> > >     > >>>
> > >     > >>> --
> > >     > >>> Robert Muir
> > >     > >>> rcmuir@gmail.com <ma...@gmail.com>
> > >     <mailto:rcmuir@gmail.com <ma...@gmail.com>>
> > >     > >>>
> > >     > >>>
> > >     > >>>
> > >     > >>>
> > >     > >>> --
> > >     > >>> Robert Muir
> > >     > >>> rcmuir@gmail.com <ma...@gmail.com>
> > >     <mailto:rcmuir@gmail.com <ma...@gmail.com>>
> > >     > >>>
> > >     > >>>
> > >     > >>>
> > >     > >>>
> > >     > >>> --
> > >     > >>> Robert Muir
> > >     > >>> rcmuir@gmail.com <ma...@gmail.com>
> > >     <mailto:rcmuir@gmail.com <ma...@gmail.com>>
> > >     > >>>
> > >     > >>>
> > >     > >> --
> > >     > >> - Mark
> > >     > >>
> > >     > >> http://www.lucidimagination.com
> > >     > >>
> > >     > >>
> > >     > >>
> > >     > >>
> > >     > >>
> > >     ------------------------------------------------------------------
> --
> > -
> > >     > >> To unsubscribe, e-mail:
> > >     java-dev-unsubscribe@lucene.apache.org
> > >     <ma...@lucene.apache.org>
> > >     > >> For additional commands, e-mail:
> > >     java-dev-help@lucene.apache.org
> > >     <ma...@lucene.apache.org>
> > >     > >>
> > >     > >
> > >     > >
> > >     > >
> > >     > >
> > >     ------------------------------------------------------------------
> --
> > -
> > >     > > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > >     <ma...@lucene.apache.org>
> > >     > > For additional commands, e-mail:
> > >     java-dev-help@lucene.apache.org
> > >     <ma...@lucene.apache.org>
> > >     > >
> > >     > >
> > >     >
> > >     >
> > >     > --
> > >     > - Mark
> > >     >
> > >     > http://www.lucidimagination.com
> > >     >
> > >     >
> > >     >
> > >     >
> > >     >
> > >     ------------------------------------------------------------------
> --
> > -
> > >     > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > >     <ma...@lucene.apache.org>
> > >     > For additional commands, e-mail: java-dev-help@lucene.apache.org
> > >     <ma...@lucene.apache.org>
> > >
> > >
> > >
> > >     ------------------------------------------------------------------
> --
> > -
> > >     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > >     <ma...@lucene.apache.org>
> > >     For additional commands, e-mail: java-dev-help@lucene.apache.org
> > >     <ma...@lucene.apache.org>
> > >
> > >
> > >
> > >
> > > --
> > > Robert Muir
> > > rcmuir@gmail.com <ma...@gmail.com>
> >
> >
> > --
> > - Mark
> >
> > http://www.lucidimagination.com
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

RE: Why release 3.0?

Posted by Uwe Schindler <uw...@thetaphi.de>.

OK, I checked. The JFLEX file in tunk was 1.4 generated. I regenerated with
1.5 and it was different (completely!). I saved the old version and renamed
to StandardTokenizerImplJava14 extends StandardTokenizerImpl

By this the impl is exchanged depending on version. The 1.4 version can no
longer be regenerated because it has no .jflex file and should really never
be regenerated.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@gmail.com]
> Sent: Monday, November 16, 2009 9:45 PM
> To: java-dev@lucene.apache.org
> Subject: Re: Why release 3.0?
> 
> I still reccomend we add a file then HowToRegenJflex.txt or something -
> that specifically says to use 1.5 or 1.6. I don't changing the current
> notice/warning is visible enough to ensure someone doesn't break this.
> 
> Robert Muir wrote:
> > no. its still 4.0, but i hear 1.7 will be 5.1 or 5.2
> >
> > the only way to truly control this, would be to use something like ICU
> > to control the unicode version being used (and actually be faster, and
> > support higher version).
> > see http://site.icu-project.org/home/why-use-icu4j
> >
> > the issue is that lucene does not have 3rd party library dependencies,
> > on the other hand, i think tika and/or nutch already incorporate icu
> > for charset detection.
> >
> > i won't argue for this really, i know nobody wants it, but you can see
> > how the situation of not being able to control unicode semantics is
> > really difficult for a search engine.
> >
> > On Mon, Nov 16, 2009 at 3:33 PM, Uwe Schindler <uschindler@pangaea.de
> > <ma...@pangaea.de>> wrote:
> >
> >     Did 1.6 change the unicode version? Robert?
> >
> >     -----
> >     UWE SCHINDLER
> >     Webserver/Middleware Development
> >     PANGAEA - Publishing Network for Geoscientific and Environmental
> Data
> >     MARUM - University of Bremen
> >     Room 2500, Leobener Str., D-28359 Bremen
> >     Tel.: +49 421 218 65595
> >     Fax:  +49 421 218 65505
> >     http://www.pangaea.de/
> >     E-mail <http://www.pangaea.de/%0AE-mail>: uschindler@pangaea.de
> >     <ma...@pangaea.de>
> >
> >     > -----Original Message-----
> >     > From: Mark Miller [mailto:markrmiller@gmail.com
> >     <ma...@gmail.com>]
> >     > Sent: Monday, November 16, 2009 9:30 PM
> >     > To: java-dev@lucene.apache.org <ma...@lucene.apache.org>
> >     > Subject: Re: Why release 3.0?
> >     >
> >     > And what happens when someone regenerates it with 1.6 without
> >     knowing?
> >     >
> >     > Uwe Schindler wrote:
> >     > > I check this by generating the file with 1.4 and 1.5. The 1.4
> >     version
> >     > will
> >     > > not change anymore, so we just leave the java file no jflex
> >     anymore. The
> >     > old
> >     > > one is used for Lucene until 2.9, if you use
> >     matchVersion=LUCENE_30, the
> >     > new
> >     > > one is used, which can also be regenerated.
> >     > >
> >     > > -----
> >     > > Uwe Schindler
> >     > > H.-H.-Meier-Allee 63, D-28213 Bremen
> >     > > http://www.thetaphi.de
> >     > > eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> >     > >
> >     > >
> >     > >> -----Original Message-----
> >     > >> From: Mark Miller [mailto:markrmiller@gmail.com
> >     <ma...@gmail.com>]
> >     > >> Sent: Monday, November 16, 2009 9:21 PM
> >     > >> To: java-dev@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     > >> Subject: Re: Why release 3.0?
> >     > >>
> >     > >> Good point - and that likely means the current warning is not
> >     working -
> >     > >> what can we do to improve it?
> >     > >>
> >     > >> Perhaps a new text file called jflexregen or something, and it
> >     > >> specifically says you must use java 1.5?
> >     > >>
> >     > >> Uwe Schindler wrote:
> >     > >>
> >     > >>> I think the regenerated code in Standard is since years no
> >     longer
> >     > >>> generated with 1.4 J Most developers use 1.5 or even 1.6. So
> it
> >     > >>> already changed incompatible.
> >     > >>>
> >     > >>>
> >     > >>>
> >     > >>> -----
> >     > >>> Uwe Schindler
> >     > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> >     > >>> http://www.thetaphi.de
> >     > >>> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> >     > >>>
> >     > >>>
> >     --------------------------------------------------------------------
> --
> >     > --
> >     > >>>
> >     > >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
> >     <ma...@gmail.com>]
> >     > >>> *Sent:* Monday, November 16, 2009 8:52 PM
> >     > >>> *To:* java-dev@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     > >>> *Subject:* Re: Why release 3.0?
> >     > >>>
> >     > >>>
> >     > >>>
> >     > >>> Uwe, thats probably a good solution I think. just as long as
> we
> >     > >>> document somewhere,
> >     > >>> I think there is some warning verbage in StandardTokenizer
> >     already
> >     > >>> about this.
> >     > >>>
> >     > >>> NOTE: if you change StandardTokenizerImpl.jflex and need to
> >     regenerate
> >     > >>>       the tokenizer, remember to use JRE 1.4 to run jflex
> >     (before
> >     > >>>       Lucene 3.0).  This grammar now uses constructs (eg
> >     :digit:,
> >     > >>>       :letter:) whose meaning can vary according to the JRE
> >     used to
> >     > >>>       run jflex.  See
> >     > >>>       https://issues.apache.org/jira/browse/LUCENE-1126 for
> >     details.
> >     > >>>
> >     > >>> On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler
> >     <uwe@thetaphi.de <ma...@thetaphi.de>
> >     > >>> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
> >     > >>>
> >     > >>> But it is a general warning that should be placed in the
> >     Wiki: If you
> >     > >>> upgrade from Java 1.4 to Java 5, think about reindexing.
> >     > >>>
> >     > >>>
> >     > >>>
> >     > >>> It has definitely nothing to do with 3.0, because uses could
> >     have
> >     > >>> changed (and most of them have) before.
> >     > >>>
> >     > >>> -----
> >     > >>> Uwe Schindler
> >     > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> >     > >>> http://www.thetaphi.de
> >     > >>> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
> >     > >>>
> >     > >>>
> >     --------------------------------------------------------------------
> --
> >     > --
> >     > >>>
> >     > >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
> >     <ma...@gmail.com>
> >     > <mailto:rcmuir@gmail.com <ma...@gmail.com>>]
> >     > >>> *Sent:* Monday, November 16, 2009 8:45 PM
> >     > >>>
> >     > >>>
> >     > >>> *To:* java-dev@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     <mailto:java-dev@lucene.apache.org
> >     <ma...@lucene.apache.org>>
> >     > >>> *Subject:* Re: Why release 3.0?
> >     > >>>
> >     > >>>
> >     > >>>
> >     > >>> right, my point is its true its nothing to do with Lucene at
> >     all,
> >     > >>>
> >     > >> really.
> >     > >>
> >     > >>> but the reality is we should clarify this to users I think.
> >     > >>>
> >     > >>> Its especially complex in the current StandardTokenizer,
> >     which uses a
> >     > >>> mix of hardcoded ranges and properties, can you tell me if
> >     you should
> >     > >>> reindex for given language X?
> >     > >>> I wouldn't want to answer that question right now.
> >     > >>>
> >     > >>> On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler
> >     <uwe@thetaphi.de <ma...@thetaphi.de>
> >     > >>> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
> >     > >>>
> >     > >>> We tried out: Character.getType() for these two chars:
> >     > >>>
> >     > >>>
> >     > >>>
> >     > >>> Java 5:
> >     > >>> '\u00AD' = 16
> >     > >>> '\u06DD' = 16
> >     > >>>
> >     > >>> Java 1.4:
> >     > >>> '\u00AD' = 20
> >     > >>> '\u06DD' = 7
> >     > >>>
> >     > >>>
> >     > >>>
> >     > >>> The first is the soft hyphen.
> >     > >>>
> >     > >>> -----
> >     > >>> Uwe Schindler
> >     > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> >     > >>> http://www.thetaphi.de
> >     > >>> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
> >     > >>>
> >     > >>>
> >     --------------------------------------------------------------------
> --
> >     > --
> >     > >>>
> >     > >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
> >     <ma...@gmail.com>
> >     > <mailto:rcmuir@gmail.com <ma...@gmail.com>>]
> >     > >>> *Sent:* Monday, November 16, 2009 8:37 PM
> >     > >>>
> >     > >>>
> >     > >>> *To:* java-dev@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     <mailto:java-dev@lucene.apache.org
> >     <ma...@lucene.apache.org>>
> >     > >>> *Subject:* Re: Why release 3.0?
> >     > >>>
> >     > >>>
> >     > >>>
> >     > >>> right, its nothing to do with lucene, instead due to
> >     property changes,
> >     > >>> etc.
> >     > >>>
> >     > >>> i just think we should inform users on java 1.4/2.9 that if
> they
> >     > >>> upgrade to java 1.5/3.0, they should reindex.
> >     > >>>
> >     > >>> the reason i say this about properties, is there are some
> >     that change
> >     > >>> that will affect tokenizers, i give two examples, a hyphen
> that
> >     > >>> changes from punctuation to format (might affect
> >     > >>>
> >     > >> SolrWordDelimiterFilter),
> >     > >>
> >     > >>> and arabic ayah which changes from NSM to format, which
> >     surely affects
> >     > >>> ArabicLetterTokenizer.
> >     > >>>
> >     > >>> On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe
> >     <sarowe@syr.edu <ma...@syr.edu>
> >     > >>> <mailto:sarowe@syr.edu <ma...@syr.edu>>> wrote:
> >     > >>>
> >     > >>> Hi Robert,
> >     > >>>
> >     > >>> I agree that the Unicode version supported by the JVM, as
> >     you say,
> >     > >>> really has nothing to do with Lucene.
> >     > >>>
> >     > >>> The disruption here is users' upgrading from Java 1.4 to
> >     1.5+, not
> >     > >>> when they upgrade Lucene.  I'd guess with few exceptions
> >     that most
> >     > >>> people have been using Lucene with 1.5+ for a couple of
> >     years now,
> >     > >>>
> >     > >> though.
> >     > >>
> >     > >>> But even the upgrade from Java 1.4 to 1.5+ will have (had)
> >     zero impact
> >     > >>> on most Lucene users, assuming that most use Latin-1
> >     exclusively;
> >     > >>> although I haven't looked, I'd be surprised if Latin-1
> >     characters
> >     > >>> changed much, if at all, from Unicode 3.0 to 4.0.
> >     > >>>
> >     > >>> It would be useful, I think, to include (a pointer to?) a
> >     description
> >     > >>> of the details of the Unicode 3.0->4.0 differences in the
> >     Lucene 3.0
> >     > >>> release notes, since the minimum required Java version, and
> >     so also
> >     > >>> the supported Unicode version, changes then.
> >     > >>>
> >     > >>> Steve
> >     > >>>
> >     > >>>
> >     > >>> On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> >     > >>>
> >     > >>>> the problem is that the properties have changed for various
> >     > >>>>
> >     > >> characters,
> >     > >>
> >     > >>>> and new characters were added.
> >     > >>>>
> >     > >>>> it really has nothing to do with lucene, but the idea you
> >     can go from
> >     > >>>> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing
> >     is not
> >     > >>>>
> >     > >> true.
> >     > >>
> >     > >>>> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler
> >     <uwe@thetaphi.de <ma...@thetaphi.de>
> >     > >>>>
> >     > >>> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
> >     > >>>
> >     > >>>>       But an UTF-8 stream from Java 4 can still be read
> >     with Java 5,
> >     > >>>> what is the problem? Java 5 extended Unicode support, but
> >     an index
> >     > >>>> created with older versions can still be read. UTF-8 is
> >     standardized.
> >     > >>>>
> >     > >>>>
> >     > >>>>
> >     > >>>>       -----
> >     > >>>>       Uwe Schindler
> >     > >>>>       H.-H.-Meier-Allee 63, D-28213 Bremen
> >     > >>>>       http://www.thetaphi.de
> >     > >>>>       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
> >     > >>>>
> >     > >>>>
> >     > >>>> ________________________________
> >     > >>>>
> >     > >>>>
> >     > >>>>       From: Robert Muir [mailto:rcmuir@gmail.com
> >     <ma...@gmail.com>
> >     > >>>>
> >     > >>> <mailto:rcmuir@gmail.com <ma...@gmail.com>>]
> >     > >>>
> >     > >>>>       Sent: Monday, November 16, 2009 8:09 PM
> >     > >>>>
> >     > >>>>       To: java-dev@lucene.apache.org
> >     <ma...@lucene.apache.org> <mailto:java- <mailto:java->
> >     > >>>>
> >     > >> dev@lucene.apache.org <ma...@lucene.apache.org>>
> >     > >>
> >     > >>>>       Subject: Re: Why release 3.0?
> >     > >>>>
> >     > >>>>
> >     > >>>>
> >     > >>>>       uwe, on topic please read my comment on LUCENE-1689,
> >     because
> >     > >>>> unicode version was bumped in jdk 1.5, i believe this index
> >     backwards
> >     > >>>> compatibility is only theoretical
> >     > >>>>
> >     > >>>>       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler
> >     <uwe@thetaphi.de <ma...@thetaphi.de>
> >     > >>>>
> >     > >>> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
> >     > >>>
> >     > >>>>       2.9 has *not* the same format as 3.0, an index
> >     created with 3.0
> >     > >>>> cannot be read with 2.9. This is because compressed field
> >     support was
> >     > >>>> removed and therefore the version number of the stored
> >     fields file
> >     > was
> >     > >>>> upgraded. But indexes from 2.9 can be read with 3.0 and
> >     support may
> >     > >>>>
> >     > >> get
> >     > >>
> >     > >>>> removed in 4.0. 3.0 Indexes can be read until version 4.9.
> >     > >>>>
> >     > >>>>
> >     > >>>>
> >     > >>>>       Uwe
> >     > >>>>
> >     > >>>>       -----
> >     > >>>>       Uwe Schindler
> >     > >>>>       H.-H.-Meier-Allee 63, D-28213 Bremen
> >     > >>>>       http://www.thetaphi.de
> >     > >>>>       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> >     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
> >     > >>>>
> >     > >>>>
> >     > >>>> ________________________________
> >     > >>>>
> >     > >>>>
> >     > >>>>       From: Jake Mannix [mailto:jake.mannix@gmail.com
> >     <ma...@gmail.com>
> >     > >>>>
> >     > >>> <mailto:jake.mannix@gmail.com <ma...@gmail.com>>]
> >     > >>>
> >     > >>>>       Sent: Monday, November 16, 2009 7:15 PM
> >     > >>>>
> >     > >>>>
> >     > >>>>       To: java-dev@lucene.apache.org
> >     <ma...@lucene.apache.org> <mailto:java- <mailto:java->
> >     > >>>>
> >     > >> dev@lucene.apache.org <ma...@lucene.apache.org>>
> >     > >>
> >     > >>>>       Subject: Re: Why release 3.0?
> >     > >>>>
> >     > >>>>
> >     > >>>>
> >     > >>>>       Don't users need to upgrade to 3.0 because 3.1 won't be
> >     > >>>> necessarily able to read your
> >     > >>>>       2.4 index file formats?  I suppose if you've already
> >     upgraded
> >     > to
> >     > >>>> 2.9, then all is well because
> >     > >>>>       2.9 is the same format as 3.0, but we can't assume
> >     all users
> >     > >>>> upgraded from 2.4 to 2.9.
> >     > >>>>
> >     > >>>>       If you've done that already, then 3.0 might not be
> >     necessary,
> >     > >>>> but if you're on 2.4 right now,
> >     > >>>>       you will be in for a bad surprise if you try to
> >     upgrade to 3.1.
> >     > >>>>
> >     > >>>>         -jake
> >     > >>>>
> >     > >>>>       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> >     > >>>> <erickerickson@gmail.com <ma...@gmail.com>
> >     <mailto:erickerickson@gmail.com <ma...@gmail.com>>>
> >     wrote:
> >     > >>>>
> >     > >>>>       One of my "specialties" is asking obvious questions
> >     just to see
> >     > >>>> if everyone's assumptions are aligned. So with the
> >     discussion about
> >     > >>>> branching 3.0 I have to ask "Is there going to be any 3.0
> >     release
> >     > >>>> intended for *production*?". And if not, would we save a lot
> of
> >     > >>>> work by just not worrying about retrofitting fixes to a 3.0
> >     branch
> >     > >>>> and carrying on with 3.1 as the first *supported* 3.x
> release?
> >     > >>>>
> >     > >>>>       Since 3.0 is "upgrade-to-java5 and remove
> >     deprecations", I'm
> >     > not
> >     > >>>> sure *as a user* I see a good reason to upgrade to 3.0.
> >     Getting a
> >     > >>>> "beta/snapshot" release to get a head start on cleaning up
> >     my code
> >     > >>>> does seem worthwhile, if I have the spare time. And having
> >     a base
> >     > >>>> 3.0 version that's not changing all over the place would be
> >     useful
> >     > >>>> for that.
> >     > >>>>
> >     > >>>>       That said, I'm also not terribly comfortable with a
> >     "release"
> >     > >>>> that's out there and unsupported.
> >     > >>>>
> >     > >>>>       Apologies if this has already been discussed, but I
> don't
> >     > >>>> remember it. Although my memory isn't what it used to be (but
> >     > >>>> some would claim it never was<G>)...
> >     > >>>>
> >     > >>>>       Erick
> >     > >>>>
> >     > >>>
> >     > >>>
> >     > >>> --
> >     > >>> Robert Muir
> >     > >>> rcmuir@gmail.com <ma...@gmail.com>
> >     <mailto:rcmuir@gmail.com <ma...@gmail.com>>
> >     > >>>
> >     > >>>
> >     > >>>
> >     > >>>
> >     > >>> --
> >     > >>> Robert Muir
> >     > >>> rcmuir@gmail.com <ma...@gmail.com>
> >     <mailto:rcmuir@gmail.com <ma...@gmail.com>>
> >     > >>>
> >     > >>>
> >     > >>>
> >     > >>>
> >     > >>> --
> >     > >>> Robert Muir
> >     > >>> rcmuir@gmail.com <ma...@gmail.com>
> >     <mailto:rcmuir@gmail.com <ma...@gmail.com>>
> >     > >>>
> >     > >>>
> >     > >> --
> >     > >> - Mark
> >     > >>
> >     > >> http://www.lucidimagination.com
> >     > >>
> >     > >>
> >     > >>
> >     > >>
> >     > >>
> >     --------------------------------------------------------------------
> -
> >     > >> To unsubscribe, e-mail:
> >     java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     > >> For additional commands, e-mail:
> >     java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     > >>
> >     > >
> >     > >
> >     > >
> >     > >
> >     --------------------------------------------------------------------
> -
> >     > > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     > > For additional commands, e-mail:
> >     java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     > >
> >     > >
> >     >
> >     >
> >     > --
> >     > - Mark
> >     >
> >     > http://www.lucidimagination.com
> >     >
> >     >
> >     >
> >     >
> >     >
> >     --------------------------------------------------------------------
> -
> >     > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >
> >
> >
> >     --------------------------------------------------------------------
> -
> >     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >     For additional commands, e-mail: java-dev-help@lucene.apache.org
> >     <ma...@lucene.apache.org>
> >
> >
> >
> >
> > --
> > Robert Muir
> > rcmuir@gmail.com <ma...@gmail.com>
> 
> 
> --
> - Mark
> 
> http://www.lucidimagination.com
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Why release 3.0?

Posted by Mark Miller <ma...@gmail.com>.

I still reccomend we add a file then HowToRegenJflex.txt or something -
that specifically says to use 1.5 or 1.6. I don't changing the current
notice/warning is visible enough to ensure someone doesn't break this.

Robert Muir wrote:
> no. its still 4.0, but i hear 1.7 will be 5.1 or 5.2
>
> the only way to truly control this, would be to use something like ICU
> to control the unicode version being used (and actually be faster, and
> support higher version).
> see http://site.icu-project.org/home/why-use-icu4j
>
> the issue is that lucene does not have 3rd party library dependencies,
> on the other hand, i think tika and/or nutch already incorporate icu
> for charset detection.
>
> i won't argue for this really, i know nobody wants it, but you can see
> how the situation of not being able to control unicode semantics is
> really difficult for a search engine.
>
> On Mon, Nov 16, 2009 at 3:33 PM, Uwe Schindler <uschindler@pangaea.de
> <ma...@pangaea.de>> wrote:
>
>     Did 1.6 change the unicode version? Robert?
>
>     -----
>     UWE SCHINDLER
>     Webserver/Middleware Development
>     PANGAEA - Publishing Network for Geoscientific and Environmental Data
>     MARUM - University of Bremen
>     Room 2500, Leobener Str., D-28359 Bremen
>     Tel.: +49 421 218 65595
>     Fax:  +49 421 218 65505
>     http://www.pangaea.de/
>     E-mail <http://www.pangaea.de/%0AE-mail>: uschindler@pangaea.de
>     <ma...@pangaea.de>
>
>     > -----Original Message-----
>     > From: Mark Miller [mailto:markrmiller@gmail.com
>     <ma...@gmail.com>]
>     > Sent: Monday, November 16, 2009 9:30 PM
>     > To: java-dev@lucene.apache.org <ma...@lucene.apache.org>
>     > Subject: Re: Why release 3.0?
>     >
>     > And what happens when someone regenerates it with 1.6 without
>     knowing?
>     >
>     > Uwe Schindler wrote:
>     > > I check this by generating the file with 1.4 and 1.5. The 1.4
>     version
>     > will
>     > > not change anymore, so we just leave the java file no jflex
>     anymore. The
>     > old
>     > > one is used for Lucene until 2.9, if you use
>     matchVersion=LUCENE_30, the
>     > new
>     > > one is used, which can also be regenerated.
>     > >
>     > > -----
>     > > Uwe Schindler
>     > > H.-H.-Meier-Allee 63, D-28213 Bremen
>     > > http://www.thetaphi.de
>     > > eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>     > >
>     > >
>     > >> -----Original Message-----
>     > >> From: Mark Miller [mailto:markrmiller@gmail.com
>     <ma...@gmail.com>]
>     > >> Sent: Monday, November 16, 2009 9:21 PM
>     > >> To: java-dev@lucene.apache.org
>     <ma...@lucene.apache.org>
>     > >> Subject: Re: Why release 3.0?
>     > >>
>     > >> Good point - and that likely means the current warning is not
>     working -
>     > >> what can we do to improve it?
>     > >>
>     > >> Perhaps a new text file called jflexregen or something, and it
>     > >> specifically says you must use java 1.5?
>     > >>
>     > >> Uwe Schindler wrote:
>     > >>
>     > >>> I think the regenerated code in Standard is since years no
>     longer
>     > >>> generated with 1.4 J Most developers use 1.5 or even 1.6. So it
>     > >>> already changed incompatible.
>     > >>>
>     > >>>
>     > >>>
>     > >>> -----
>     > >>> Uwe Schindler
>     > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
>     > >>> http://www.thetaphi.de
>     > >>> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>     > >>>
>     > >>>
>     ----------------------------------------------------------------------
>     > --
>     > >>>
>     > >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
>     <ma...@gmail.com>]
>     > >>> *Sent:* Monday, November 16, 2009 8:52 PM
>     > >>> *To:* java-dev@lucene.apache.org
>     <ma...@lucene.apache.org>
>     > >>> *Subject:* Re: Why release 3.0?
>     > >>>
>     > >>>
>     > >>>
>     > >>> Uwe, thats probably a good solution I think. just as long as we
>     > >>> document somewhere,
>     > >>> I think there is some warning verbage in StandardTokenizer
>     already
>     > >>> about this.
>     > >>>
>     > >>> NOTE: if you change StandardTokenizerImpl.jflex and need to
>     regenerate
>     > >>>       the tokenizer, remember to use JRE 1.4 to run jflex
>     (before
>     > >>>       Lucene 3.0).  This grammar now uses constructs (eg
>     :digit:,
>     > >>>       :letter:) whose meaning can vary according to the JRE
>     used to
>     > >>>       run jflex.  See
>     > >>>       https://issues.apache.org/jira/browse/LUCENE-1126 for
>     details.
>     > >>>
>     > >>> On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler
>     <uwe@thetaphi.de <ma...@thetaphi.de>
>     > >>> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
>     > >>>
>     > >>> But it is a general warning that should be placed in the
>     Wiki: If you
>     > >>> upgrade from Java 1.4 to Java 5, think about reindexing.
>     > >>>
>     > >>>
>     > >>>
>     > >>> It has definitely nothing to do with 3.0, because uses could
>     have
>     > >>> changed (and most of them have) before.
>     > >>>
>     > >>> -----
>     > >>> Uwe Schindler
>     > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
>     > >>> http://www.thetaphi.de
>     > >>> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
>     > >>>
>     > >>>
>     ----------------------------------------------------------------------
>     > --
>     > >>>
>     > >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
>     <ma...@gmail.com>
>     > <mailto:rcmuir@gmail.com <ma...@gmail.com>>]
>     > >>> *Sent:* Monday, November 16, 2009 8:45 PM
>     > >>>
>     > >>>
>     > >>> *To:* java-dev@lucene.apache.org
>     <ma...@lucene.apache.org>
>     <mailto:java-dev@lucene.apache.org
>     <ma...@lucene.apache.org>>
>     > >>> *Subject:* Re: Why release 3.0?
>     > >>>
>     > >>>
>     > >>>
>     > >>> right, my point is its true its nothing to do with Lucene at
>     all,
>     > >>>
>     > >> really.
>     > >>
>     > >>> but the reality is we should clarify this to users I think.
>     > >>>
>     > >>> Its especially complex in the current StandardTokenizer,
>     which uses a
>     > >>> mix of hardcoded ranges and properties, can you tell me if
>     you should
>     > >>> reindex for given language X?
>     > >>> I wouldn't want to answer that question right now.
>     > >>>
>     > >>> On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler
>     <uwe@thetaphi.de <ma...@thetaphi.de>
>     > >>> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
>     > >>>
>     > >>> We tried out: Character.getType() for these two chars:
>     > >>>
>     > >>>
>     > >>>
>     > >>> Java 5:
>     > >>> '\u00AD' = 16
>     > >>> '\u06DD' = 16
>     > >>>
>     > >>> Java 1.4:
>     > >>> '\u00AD' = 20
>     > >>> '\u06DD' = 7
>     > >>>
>     > >>>
>     > >>>
>     > >>> The first is the soft hyphen.
>     > >>>
>     > >>> -----
>     > >>> Uwe Schindler
>     > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
>     > >>> http://www.thetaphi.de
>     > >>> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
>     > >>>
>     > >>>
>     ----------------------------------------------------------------------
>     > --
>     > >>>
>     > >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
>     <ma...@gmail.com>
>     > <mailto:rcmuir@gmail.com <ma...@gmail.com>>]
>     > >>> *Sent:* Monday, November 16, 2009 8:37 PM
>     > >>>
>     > >>>
>     > >>> *To:* java-dev@lucene.apache.org
>     <ma...@lucene.apache.org>
>     <mailto:java-dev@lucene.apache.org
>     <ma...@lucene.apache.org>>
>     > >>> *Subject:* Re: Why release 3.0?
>     > >>>
>     > >>>
>     > >>>
>     > >>> right, its nothing to do with lucene, instead due to
>     property changes,
>     > >>> etc.
>     > >>>
>     > >>> i just think we should inform users on java 1.4/2.9 that if they
>     > >>> upgrade to java 1.5/3.0, they should reindex.
>     > >>>
>     > >>> the reason i say this about properties, is there are some
>     that change
>     > >>> that will affect tokenizers, i give two examples, a hyphen that
>     > >>> changes from punctuation to format (might affect
>     > >>>
>     > >> SolrWordDelimiterFilter),
>     > >>
>     > >>> and arabic ayah which changes from NSM to format, which
>     surely affects
>     > >>> ArabicLetterTokenizer.
>     > >>>
>     > >>> On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe
>     <sarowe@syr.edu <ma...@syr.edu>
>     > >>> <mailto:sarowe@syr.edu <ma...@syr.edu>>> wrote:
>     > >>>
>     > >>> Hi Robert,
>     > >>>
>     > >>> I agree that the Unicode version supported by the JVM, as
>     you say,
>     > >>> really has nothing to do with Lucene.
>     > >>>
>     > >>> The disruption here is users' upgrading from Java 1.4 to
>     1.5+, not
>     > >>> when they upgrade Lucene.  I'd guess with few exceptions
>     that most
>     > >>> people have been using Lucene with 1.5+ for a couple of
>     years now,
>     > >>>
>     > >> though.
>     > >>
>     > >>> But even the upgrade from Java 1.4 to 1.5+ will have (had)
>     zero impact
>     > >>> on most Lucene users, assuming that most use Latin-1
>     exclusively;
>     > >>> although I haven't looked, I'd be surprised if Latin-1
>     characters
>     > >>> changed much, if at all, from Unicode 3.0 to 4.0.
>     > >>>
>     > >>> It would be useful, I think, to include (a pointer to?) a
>     description
>     > >>> of the details of the Unicode 3.0->4.0 differences in the
>     Lucene 3.0
>     > >>> release notes, since the minimum required Java version, and
>     so also
>     > >>> the supported Unicode version, changes then.
>     > >>>
>     > >>> Steve
>     > >>>
>     > >>>
>     > >>> On 11/16/2009 at 2:15 PM, Robert Muir wrote:
>     > >>>
>     > >>>> the problem is that the properties have changed for various
>     > >>>>
>     > >> characters,
>     > >>
>     > >>>> and new characters were added.
>     > >>>>
>     > >>>> it really has nothing to do with lucene, but the idea you
>     can go from
>     > >>>> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing
>     is not
>     > >>>>
>     > >> true.
>     > >>
>     > >>>> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler
>     <uwe@thetaphi.de <ma...@thetaphi.de>
>     > >>>>
>     > >>> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
>     > >>>
>     > >>>>       But an UTF-8 stream from Java 4 can still be read
>     with Java 5,
>     > >>>> what is the problem? Java 5 extended Unicode support, but
>     an index
>     > >>>> created with older versions can still be read. UTF-8 is
>     standardized.
>     > >>>>
>     > >>>>
>     > >>>>
>     > >>>>       -----
>     > >>>>       Uwe Schindler
>     > >>>>       H.-H.-Meier-Allee 63, D-28213 Bremen
>     > >>>>       http://www.thetaphi.de
>     > >>>>       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
>     > >>>>
>     > >>>>
>     > >>>> ________________________________
>     > >>>>
>     > >>>>
>     > >>>>       From: Robert Muir [mailto:rcmuir@gmail.com
>     <ma...@gmail.com>
>     > >>>>
>     > >>> <mailto:rcmuir@gmail.com <ma...@gmail.com>>]
>     > >>>
>     > >>>>       Sent: Monday, November 16, 2009 8:09 PM
>     > >>>>
>     > >>>>       To: java-dev@lucene.apache.org
>     <ma...@lucene.apache.org> <mailto:java- <mailto:java->
>     > >>>>
>     > >> dev@lucene.apache.org <ma...@lucene.apache.org>>
>     > >>
>     > >>>>       Subject: Re: Why release 3.0?
>     > >>>>
>     > >>>>
>     > >>>>
>     > >>>>       uwe, on topic please read my comment on LUCENE-1689,
>     because
>     > >>>> unicode version was bumped in jdk 1.5, i believe this index
>     backwards
>     > >>>> compatibility is only theoretical
>     > >>>>
>     > >>>>       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler
>     <uwe@thetaphi.de <ma...@thetaphi.de>
>     > >>>>
>     > >>> <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>> wrote:
>     > >>>
>     > >>>>       2.9 has *not* the same format as 3.0, an index
>     created with 3.0
>     > >>>> cannot be read with 2.9. This is because compressed field
>     support was
>     > >>>> removed and therefore the version number of the stored
>     fields file
>     > was
>     > >>>> upgraded. But indexes from 2.9 can be read with 3.0 and
>     support may
>     > >>>>
>     > >> get
>     > >>
>     > >>>> removed in 4.0. 3.0 Indexes can be read until version 4.9.
>     > >>>>
>     > >>>>
>     > >>>>
>     > >>>>       Uwe
>     > >>>>
>     > >>>>       -----
>     > >>>>       Uwe Schindler
>     > >>>>       H.-H.-Meier-Allee 63, D-28213 Bremen
>     > >>>>       http://www.thetaphi.de
>     > >>>>       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>     <mailto:uwe@thetaphi.de <ma...@thetaphi.de>>
>     > >>>>
>     > >>>>
>     > >>>> ________________________________
>     > >>>>
>     > >>>>
>     > >>>>       From: Jake Mannix [mailto:jake.mannix@gmail.com
>     <ma...@gmail.com>
>     > >>>>
>     > >>> <mailto:jake.mannix@gmail.com <ma...@gmail.com>>]
>     > >>>
>     > >>>>       Sent: Monday, November 16, 2009 7:15 PM
>     > >>>>
>     > >>>>
>     > >>>>       To: java-dev@lucene.apache.org
>     <ma...@lucene.apache.org> <mailto:java- <mailto:java->
>     > >>>>
>     > >> dev@lucene.apache.org <ma...@lucene.apache.org>>
>     > >>
>     > >>>>       Subject: Re: Why release 3.0?
>     > >>>>
>     > >>>>
>     > >>>>
>     > >>>>       Don't users need to upgrade to 3.0 because 3.1 won't be
>     > >>>> necessarily able to read your
>     > >>>>       2.4 index file formats?  I suppose if you've already
>     upgraded
>     > to
>     > >>>> 2.9, then all is well because
>     > >>>>       2.9 is the same format as 3.0, but we can't assume
>     all users
>     > >>>> upgraded from 2.4 to 2.9.
>     > >>>>
>     > >>>>       If you've done that already, then 3.0 might not be
>     necessary,
>     > >>>> but if you're on 2.4 right now,
>     > >>>>       you will be in for a bad surprise if you try to
>     upgrade to 3.1.
>     > >>>>
>     > >>>>         -jake
>     > >>>>
>     > >>>>       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
>     > >>>> <erickerickson@gmail.com <ma...@gmail.com>
>     <mailto:erickerickson@gmail.com <ma...@gmail.com>>>
>     wrote:
>     > >>>>
>     > >>>>       One of my "specialties" is asking obvious questions
>     just to see
>     > >>>> if everyone's assumptions are aligned. So with the
>     discussion about
>     > >>>> branching 3.0 I have to ask "Is there going to be any 3.0
>     release
>     > >>>> intended for *production*?". And if not, would we save a lot of
>     > >>>> work by just not worrying about retrofitting fixes to a 3.0
>     branch
>     > >>>> and carrying on with 3.1 as the first *supported* 3.x release?
>     > >>>>
>     > >>>>       Since 3.0 is "upgrade-to-java5 and remove
>     deprecations", I'm
>     > not
>     > >>>> sure *as a user* I see a good reason to upgrade to 3.0.
>     Getting a
>     > >>>> "beta/snapshot" release to get a head start on cleaning up
>     my code
>     > >>>> does seem worthwhile, if I have the spare time. And having
>     a base
>     > >>>> 3.0 version that's not changing all over the place would be
>     useful
>     > >>>> for that.
>     > >>>>
>     > >>>>       That said, I'm also not terribly comfortable with a
>     "release"
>     > >>>> that's out there and unsupported.
>     > >>>>
>     > >>>>       Apologies if this has already been discussed, but I don't
>     > >>>> remember it. Although my memory isn't what it used to be (but
>     > >>>> some would claim it never was<G>)...
>     > >>>>
>     > >>>>       Erick
>     > >>>>
>     > >>>
>     > >>>
>     > >>> --
>     > >>> Robert Muir
>     > >>> rcmuir@gmail.com <ma...@gmail.com>
>     <mailto:rcmuir@gmail.com <ma...@gmail.com>>
>     > >>>
>     > >>>
>     > >>>
>     > >>>
>     > >>> --
>     > >>> Robert Muir
>     > >>> rcmuir@gmail.com <ma...@gmail.com>
>     <mailto:rcmuir@gmail.com <ma...@gmail.com>>
>     > >>>
>     > >>>
>     > >>>
>     > >>>
>     > >>> --
>     > >>> Robert Muir
>     > >>> rcmuir@gmail.com <ma...@gmail.com>
>     <mailto:rcmuir@gmail.com <ma...@gmail.com>>
>     > >>>
>     > >>>
>     > >> --
>     > >> - Mark
>     > >>
>     > >> http://www.lucidimagination.com
>     > >>
>     > >>
>     > >>
>     > >>
>     > >>
>     ---------------------------------------------------------------------
>     > >> To unsubscribe, e-mail:
>     java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     > >> For additional commands, e-mail:
>     java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>     > >>
>     > >
>     > >
>     > >
>     > >
>     ---------------------------------------------------------------------
>     > > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     > > For additional commands, e-mail:
>     java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>     > >
>     > >
>     >
>     >
>     > --
>     > - Mark
>     >
>     > http://www.lucidimagination.com
>     >
>     >
>     >
>     >
>     >
>     ---------------------------------------------------------------------
>     > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     > For additional commands, e-mail: java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>     <ma...@lucene.apache.org>
>     For additional commands, e-mail: java-dev-help@lucene.apache.org
>     <ma...@lucene.apache.org>
>
>
>
>
> -- 
> Robert Muir
> rcmuir@gmail.com <ma...@gmail.com>


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Why release 3.0?

Posted by Robert Muir <rc...@gmail.com>.

no. its still 4.0, but i hear 1.7 will be 5.1 or 5.2

the only way to truly control this, would be to use something like ICU to
control the unicode version being used (and actually be faster, and support
higher version).
see http://site.icu-project.org/home/why-use-icu4j

the issue is that lucene does not have 3rd party library dependencies, on
the other hand, i think tika and/or nutch already incorporate icu for
charset detection.

i won't argue for this really, i know nobody wants it, but you can see how
the situation of not being able to control unicode semantics is really
difficult for a search engine.

On Mon, Nov 16, 2009 at 3:33 PM, Uwe Schindler <us...@pangaea.de>wrote:

> Did 1.6 change the unicode version? Robert?
>
> -----
> UWE SCHINDLER
> Webserver/Middleware Development
> PANGAEA - Publishing Network for Geoscientific and Environmental Data
> MARUM - University of Bremen
> Room 2500, Leobener Str., D-28359 Bremen
> Tel.: +49 421 218 65595
> Fax:  +49 421 218 65505
> http://www.pangaea.de/
> E-mail <http://www.pangaea.de/%0AE-mail>: uschindler@pangaea.de
>
> > -----Original Message-----
> > From: Mark Miller [mailto:markrmiller@gmail.com]
> > Sent: Monday, November 16, 2009 9:30 PM
> > To: java-dev@lucene.apache.org
> > Subject: Re: Why release 3.0?
> >
> > And what happens when someone regenerates it with 1.6 without knowing?
> >
> > Uwe Schindler wrote:
> > > I check this by generating the file with 1.4 and 1.5. The 1.4 version
> > will
> > > not change anymore, so we just leave the java file no jflex anymore.
> The
> > old
> > > one is used for Lucene until 2.9, if you use matchVersion=LUCENE_30,
> the
> > new
> > > one is used, which can also be regenerated.
> > >
> > > -----
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > http://www.thetaphi.de
> > > eMail: uwe@thetaphi.de
> > >
> > >
> > >> -----Original Message-----
> > >> From: Mark Miller [mailto:markrmiller@gmail.com]
> > >> Sent: Monday, November 16, 2009 9:21 PM
> > >> To: java-dev@lucene.apache.org
> > >> Subject: Re: Why release 3.0?
> > >>
> > >> Good point - and that likely means the current warning is not working
> -
> > >> what can we do to improve it?
> > >>
> > >> Perhaps a new text file called jflexregen or something, and it
> > >> specifically says you must use java 1.5?
> > >>
> > >> Uwe Schindler wrote:
> > >>
> > >>> I think the regenerated code in Standard is since years no longer
> > >>> generated with 1.4 J Most developers use 1.5 or even 1.6. So it
> > >>> already changed incompatible.
> > >>>
> > >>>
> > >>>
> > >>> -----
> > >>> Uwe Schindler
> > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> > >>> http://www.thetaphi.de
> > >>> eMail: uwe@thetaphi.de
> > >>>
> > >>>
> ----------------------------------------------------------------------
> > --
> > >>>
> > >>> *From:* Robert Muir [mailto:rcmuir@gmail.com]
> > >>> *Sent:* Monday, November 16, 2009 8:52 PM
> > >>> *To:* java-dev@lucene.apache.org
> > >>> *Subject:* Re: Why release 3.0?
> > >>>
> > >>>
> > >>>
> > >>> Uwe, thats probably a good solution I think. just as long as we
> > >>> document somewhere,
> > >>> I think there is some warning verbage in StandardTokenizer already
> > >>> about this.
> > >>>
> > >>> NOTE: if you change StandardTokenizerImpl.jflex and need to
> regenerate
> > >>>       the tokenizer, remember to use JRE 1.4 to run jflex (before
> > >>>       Lucene 3.0).  This grammar now uses constructs (eg :digit:,
> > >>>       :letter:) whose meaning can vary according to the JRE used to
> > >>>       run jflex.  See
> > >>>       https://issues.apache.org/jira/browse/LUCENE-1126 for details.
> > >>>
> > >>> On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler <uwe@thetaphi.de
> > >>> <ma...@thetaphi.de>> wrote:
> > >>>
> > >>> But it is a general warning that should be placed in the Wiki: If you
> > >>> upgrade from Java 1.4 to Java 5, think about reindexing.
> > >>>
> > >>>
> > >>>
> > >>> It has definitely nothing to do with 3.0, because uses could have
> > >>> changed (and most of them have) before.
> > >>>
> > >>> -----
> > >>> Uwe Schindler
> > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> > >>> http://www.thetaphi.de
> > >>> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> > >>>
> > >>>
> ----------------------------------------------------------------------
> > --
> > >>>
> > >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
> > <ma...@gmail.com>]
> > >>> *Sent:* Monday, November 16, 2009 8:45 PM
> > >>>
> > >>>
> > >>> *To:* java-dev@lucene.apache.org <ma...@lucene.apache.org>
> > >>> *Subject:* Re: Why release 3.0?
> > >>>
> > >>>
> > >>>
> > >>> right, my point is its true its nothing to do with Lucene at all,
> > >>>
> > >> really.
> > >>
> > >>> but the reality is we should clarify this to users I think.
> > >>>
> > >>> Its especially complex in the current StandardTokenizer, which uses a
> > >>> mix of hardcoded ranges and properties, can you tell me if you should
> > >>> reindex for given language X?
> > >>> I wouldn't want to answer that question right now.
> > >>>
> > >>> On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <uwe@thetaphi.de
> > >>> <ma...@thetaphi.de>> wrote:
> > >>>
> > >>> We tried out: Character.getType() for these two chars:
> > >>>
> > >>>
> > >>>
> > >>> Java 5:
> > >>> '\u00AD' = 16
> > >>> '\u06DD' = 16
> > >>>
> > >>> Java 1.4:
> > >>> '\u00AD' = 20
> > >>> '\u06DD' = 7
> > >>>
> > >>>
> > >>>
> > >>> The first is the soft hyphen.
> > >>>
> > >>> -----
> > >>> Uwe Schindler
> > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> > >>> http://www.thetaphi.de
> > >>> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> > >>>
> > >>>
> ----------------------------------------------------------------------
> > --
> > >>>
> > >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
> > <ma...@gmail.com>]
> > >>> *Sent:* Monday, November 16, 2009 8:37 PM
> > >>>
> > >>>
> > >>> *To:* java-dev@lucene.apache.org <ma...@lucene.apache.org>
> > >>> *Subject:* Re: Why release 3.0?
> > >>>
> > >>>
> > >>>
> > >>> right, its nothing to do with lucene, instead due to property
> changes,
> > >>> etc.
> > >>>
> > >>> i just think we should inform users on java 1.4/2.9 that if they
> > >>> upgrade to java 1.5/3.0, they should reindex.
> > >>>
> > >>> the reason i say this about properties, is there are some that change
> > >>> that will affect tokenizers, i give two examples, a hyphen that
> > >>> changes from punctuation to format (might affect
> > >>>
> > >> SolrWordDelimiterFilter),
> > >>
> > >>> and arabic ayah which changes from NSM to format, which surely
> affects
> > >>> ArabicLetterTokenizer.
> > >>>
> > >>> On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sarowe@syr.edu
> > >>> <ma...@syr.edu>> wrote:
> > >>>
> > >>> Hi Robert,
> > >>>
> > >>> I agree that the Unicode version supported by the JVM, as you say,
> > >>> really has nothing to do with Lucene.
> > >>>
> > >>> The disruption here is users' upgrading from Java 1.4 to 1.5+, not
> > >>> when they upgrade Lucene.  I'd guess with few exceptions that most
> > >>> people have been using Lucene with 1.5+ for a couple of years now,
> > >>>
> > >> though.
> > >>
> > >>> But even the upgrade from Java 1.4 to 1.5+ will have (had) zero
> impact
> > >>> on most Lucene users, assuming that most use Latin-1 exclusively;
> > >>> although I haven't looked, I'd be surprised if Latin-1 characters
> > >>> changed much, if at all, from Unicode 3.0 to 4.0.
> > >>>
> > >>> It would be useful, I think, to include (a pointer to?) a description
> > >>> of the details of the Unicode 3.0->4.0 differences in the Lucene 3.0
> > >>> release notes, since the minimum required Java version, and so also
> > >>> the supported Unicode version, changes then.
> > >>>
> > >>> Steve
> > >>>
> > >>>
> > >>> On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> > >>>
> > >>>> the problem is that the properties have changed for various
> > >>>>
> > >> characters,
> > >>
> > >>>> and new characters were added.
> > >>>>
> > >>>> it really has nothing to do with lucene, but the idea you can go
> from
> > >>>> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not
> > >>>>
> > >> true.
> > >>
> > >>>> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uwe@thetaphi.de
> > >>>>
> > >>> <ma...@thetaphi.de>> wrote:
> > >>>
> > >>>>       But an UTF-8 stream from Java 4 can still be read with Java 5,
> > >>>> what is the problem? Java 5 extended Unicode support, but an index
> > >>>> created with older versions can still be read. UTF-8 is
> standardized.
> > >>>>
> > >>>>
> > >>>>
> > >>>>       -----
> > >>>>       Uwe Schindler
> > >>>>       H.-H.-Meier-Allee 63, D-28213 Bremen
> > >>>>       http://www.thetaphi.de
> > >>>>       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> > >>>>
> > >>>>
> > >>>> ________________________________
> > >>>>
> > >>>>
> > >>>>       From: Robert Muir [mailto:rcmuir@gmail.com
> > >>>>
> > >>> <ma...@gmail.com>]
> > >>>
> > >>>>       Sent: Monday, November 16, 2009 8:09 PM
> > >>>>
> > >>>>       To: java-dev@lucene.apache.org <mailto:java-
> > >>>>
> > >> dev@lucene.apache.org>
> > >>
> > >>>>       Subject: Re: Why release 3.0?
> > >>>>
> > >>>>
> > >>>>
> > >>>>       uwe, on topic please read my comment on LUCENE-1689, because
> > >>>> unicode version was bumped in jdk 1.5, i believe this index
> backwards
> > >>>> compatibility is only theoretical
> > >>>>
> > >>>>       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <
> uwe@thetaphi.de
> > >>>>
> > >>> <ma...@thetaphi.de>> wrote:
> > >>>
> > >>>>       2.9 has *not* the same format as 3.0, an index created with
> 3.0
> > >>>> cannot be read with 2.9. This is because compressed field support
> was
> > >>>> removed and therefore the version number of the stored fields file
> > was
> > >>>> upgraded. But indexes from 2.9 can be read with 3.0 and support may
> > >>>>
> > >> get
> > >>
> > >>>> removed in 4.0. 3.0 Indexes can be read until version 4.9.
> > >>>>
> > >>>>
> > >>>>
> > >>>>       Uwe
> > >>>>
> > >>>>       -----
> > >>>>       Uwe Schindler
> > >>>>       H.-H.-Meier-Allee 63, D-28213 Bremen
> > >>>>       http://www.thetaphi.de
> > >>>>       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> > >>>>
> > >>>>
> > >>>> ________________________________
> > >>>>
> > >>>>
> > >>>>       From: Jake Mannix [mailto:jake.mannix@gmail.com
> > >>>>
> > >>> <ma...@gmail.com>]
> > >>>
> > >>>>       Sent: Monday, November 16, 2009 7:15 PM
> > >>>>
> > >>>>
> > >>>>       To: java-dev@lucene.apache.org <mailto:java-
> > >>>>
> > >> dev@lucene.apache.org>
> > >>
> > >>>>       Subject: Re: Why release 3.0?
> > >>>>
> > >>>>
> > >>>>
> > >>>>       Don't users need to upgrade to 3.0 because 3.1 won't be
> > >>>> necessarily able to read your
> > >>>>       2.4 index file formats?  I suppose if you've already upgraded
> > to
> > >>>> 2.9, then all is well because
> > >>>>       2.9 is the same format as 3.0, but we can't assume all users
> > >>>> upgraded from 2.4 to 2.9.
> > >>>>
> > >>>>       If you've done that already, then 3.0 might not be necessary,
> > >>>> but if you're on 2.4 right now,
> > >>>>       you will be in for a bad surprise if you try to upgrade to
> 3.1.
> > >>>>
> > >>>>         -jake
> > >>>>
> > >>>>       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> > >>>> <erickerickson@gmail.com <ma...@gmail.com>> wrote:
> > >>>>
> > >>>>       One of my "specialties" is asking obvious questions just to
> see
> > >>>> if everyone's assumptions are aligned. So with the discussion about
> > >>>> branching 3.0 I have to ask "Is there going to be any 3.0 release
> > >>>> intended for *production*?". And if not, would we save a lot of
> > >>>> work by just not worrying about retrofitting fixes to a 3.0 branch
> > >>>> and carrying on with 3.1 as the first *supported* 3.x release?
> > >>>>
> > >>>>       Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm
> > not
> > >>>> sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> > >>>> "beta/snapshot" release to get a head start on cleaning up my code
> > >>>> does seem worthwhile, if I have the spare time. And having a base
> > >>>> 3.0 version that's not changing all over the place would be useful
> > >>>> for that.
> > >>>>
> > >>>>       That said, I'm also not terribly comfortable with a "release"
> > >>>> that's out there and unsupported.
> > >>>>
> > >>>>       Apologies if this has already been discussed, but I don't
> > >>>> remember it. Although my memory isn't what it used to be (but
> > >>>> some would claim it never was<G>)...
> > >>>>
> > >>>>       Erick
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> Robert Muir
> > >>> rcmuir@gmail.com <ma...@gmail.com>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Robert Muir
> > >>> rcmuir@gmail.com <ma...@gmail.com>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Robert Muir
> > >>> rcmuir@gmail.com <ma...@gmail.com>
> > >>>
> > >>>
> > >> --
> > >> - Mark
> > >>
> > >> http://www.lucidimagination.com
> > >>
> > >>
> > >>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> > >>
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-dev-help@lucene.apache.org
> > >
> > >
> >
> >
> > --
> > - Mark
> >
> > http://www.lucidimagination.com
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


-- 
Robert Muir
rcmuir@gmail.com

RE: Why release 3.0?

Posted by Uwe Schindler <us...@pangaea.de>.

Did 1.6 change the unicode version? Robert?

-----
UWE SCHINDLER
Webserver/Middleware Development
PANGAEA - Publishing Network for Geoscientific and Environmental Data
MARUM - University of Bremen
Room 2500, Leobener Str., D-28359 Bremen
Tel.: +49 421 218 65595
Fax:  +49 421 218 65505
http://www.pangaea.de/
E-mail: uschindler@pangaea.de

> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@gmail.com]
> Sent: Monday, November 16, 2009 9:30 PM
> To: java-dev@lucene.apache.org
> Subject: Re: Why release 3.0?
> 
> And what happens when someone regenerates it with 1.6 without knowing?
> 
> Uwe Schindler wrote:
> > I check this by generating the file with 1.4 and 1.5. The 1.4 version
> will
> > not change anymore, so we just leave the java file no jflex anymore. The
> old
> > one is used for Lucene until 2.9, if you use matchVersion=LUCENE_30, the
> new
> > one is used, which can also be regenerated.
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> >> -----Original Message-----
> >> From: Mark Miller [mailto:markrmiller@gmail.com]
> >> Sent: Monday, November 16, 2009 9:21 PM
> >> To: java-dev@lucene.apache.org
> >> Subject: Re: Why release 3.0?
> >>
> >> Good point - and that likely means the current warning is not working -
> >> what can we do to improve it?
> >>
> >> Perhaps a new text file called jflexregen or something, and it
> >> specifically says you must use java 1.5?
> >>
> >> Uwe Schindler wrote:
> >>
> >>> I think the regenerated code in Standard is since years no longer
> >>> generated with 1.4 J Most developers use 1.5 or even 1.6. So it
> >>> already changed incompatible.
> >>>
> >>>
> >>>
> >>> -----
> >>> Uwe Schindler
> >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> >>> http://www.thetaphi.de
> >>> eMail: uwe@thetaphi.de
> >>>
> >>> ----------------------------------------------------------------------
> --
> >>>
> >>> *From:* Robert Muir [mailto:rcmuir@gmail.com]
> >>> *Sent:* Monday, November 16, 2009 8:52 PM
> >>> *To:* java-dev@lucene.apache.org
> >>> *Subject:* Re: Why release 3.0?
> >>>
> >>>
> >>>
> >>> Uwe, thats probably a good solution I think. just as long as we
> >>> document somewhere,
> >>> I think there is some warning verbage in StandardTokenizer already
> >>> about this.
> >>>
> >>> NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate
> >>>       the tokenizer, remember to use JRE 1.4 to run jflex (before
> >>>       Lucene 3.0).  This grammar now uses constructs (eg :digit:,
> >>>       :letter:) whose meaning can vary according to the JRE used to
> >>>       run jflex.  See
> >>>       https://issues.apache.org/jira/browse/LUCENE-1126 for details.
> >>>
> >>> On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler <uwe@thetaphi.de
> >>> <ma...@thetaphi.de>> wrote:
> >>>
> >>> But it is a general warning that should be placed in the Wiki: If you
> >>> upgrade from Java 1.4 to Java 5, think about reindexing.
> >>>
> >>>
> >>>
> >>> It has definitely nothing to do with 3.0, because uses could have
> >>> changed (and most of them have) before.
> >>>
> >>> -----
> >>> Uwe Schindler
> >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> >>> http://www.thetaphi.de
> >>> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> >>>
> >>> ----------------------------------------------------------------------
> --
> >>>
> >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
> <ma...@gmail.com>]
> >>> *Sent:* Monday, November 16, 2009 8:45 PM
> >>>
> >>>
> >>> *To:* java-dev@lucene.apache.org <ma...@lucene.apache.org>
> >>> *Subject:* Re: Why release 3.0?
> >>>
> >>>
> >>>
> >>> right, my point is its true its nothing to do with Lucene at all,
> >>>
> >> really.
> >>
> >>> but the reality is we should clarify this to users I think.
> >>>
> >>> Its especially complex in the current StandardTokenizer, which uses a
> >>> mix of hardcoded ranges and properties, can you tell me if you should
> >>> reindex for given language X?
> >>> I wouldn't want to answer that question right now.
> >>>
> >>> On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <uwe@thetaphi.de
> >>> <ma...@thetaphi.de>> wrote:
> >>>
> >>> We tried out: Character.getType() for these two chars:
> >>>
> >>>
> >>>
> >>> Java 5:
> >>> '\u00AD' = 16
> >>> '\u06DD' = 16
> >>>
> >>> Java 1.4:
> >>> '\u00AD' = 20
> >>> '\u06DD' = 7
> >>>
> >>>
> >>>
> >>> The first is the soft hyphen.
> >>>
> >>> -----
> >>> Uwe Schindler
> >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> >>> http://www.thetaphi.de
> >>> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> >>>
> >>> ----------------------------------------------------------------------
> --
> >>>
> >>> *From:* Robert Muir [mailto:rcmuir@gmail.com
> <ma...@gmail.com>]
> >>> *Sent:* Monday, November 16, 2009 8:37 PM
> >>>
> >>>
> >>> *To:* java-dev@lucene.apache.org <ma...@lucene.apache.org>
> >>> *Subject:* Re: Why release 3.0?
> >>>
> >>>
> >>>
> >>> right, its nothing to do with lucene, instead due to property changes,
> >>> etc.
> >>>
> >>> i just think we should inform users on java 1.4/2.9 that if they
> >>> upgrade to java 1.5/3.0, they should reindex.
> >>>
> >>> the reason i say this about properties, is there are some that change
> >>> that will affect tokenizers, i give two examples, a hyphen that
> >>> changes from punctuation to format (might affect
> >>>
> >> SolrWordDelimiterFilter),
> >>
> >>> and arabic ayah which changes from NSM to format, which surely affects
> >>> ArabicLetterTokenizer.
> >>>
> >>> On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sarowe@syr.edu
> >>> <ma...@syr.edu>> wrote:
> >>>
> >>> Hi Robert,
> >>>
> >>> I agree that the Unicode version supported by the JVM, as you say,
> >>> really has nothing to do with Lucene.
> >>>
> >>> The disruption here is users' upgrading from Java 1.4 to 1.5+, not
> >>> when they upgrade Lucene.  I'd guess with few exceptions that most
> >>> people have been using Lucene with 1.5+ for a couple of years now,
> >>>
> >> though.
> >>
> >>> But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact
> >>> on most Lucene users, assuming that most use Latin-1 exclusively;
> >>> although I haven't looked, I'd be surprised if Latin-1 characters
> >>> changed much, if at all, from Unicode 3.0 to 4.0.
> >>>
> >>> It would be useful, I think, to include (a pointer to?) a description
> >>> of the details of the Unicode 3.0->4.0 differences in the Lucene 3.0
> >>> release notes, since the minimum required Java version, and so also
> >>> the supported Unicode version, changes then.
> >>>
> >>> Steve
> >>>
> >>>
> >>> On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> >>>
> >>>> the problem is that the properties have changed for various
> >>>>
> >> characters,
> >>
> >>>> and new characters were added.
> >>>>
> >>>> it really has nothing to do with lucene, but the idea you can go from
> >>>> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not
> >>>>
> >> true.
> >>
> >>>> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uwe@thetaphi.de
> >>>>
> >>> <ma...@thetaphi.de>> wrote:
> >>>
> >>>>       But an UTF-8 stream from Java 4 can still be read with Java 5,
> >>>> what is the problem? Java 5 extended Unicode support, but an index
> >>>> created with older versions can still be read. UTF-8 is standardized.
> >>>>
> >>>>
> >>>>
> >>>>       -----
> >>>>       Uwe Schindler
> >>>>       H.-H.-Meier-Allee 63, D-28213 Bremen
> >>>>       http://www.thetaphi.de
> >>>>       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> >>>>
> >>>>
> >>>> ________________________________
> >>>>
> >>>>
> >>>>       From: Robert Muir [mailto:rcmuir@gmail.com
> >>>>
> >>> <ma...@gmail.com>]
> >>>
> >>>>       Sent: Monday, November 16, 2009 8:09 PM
> >>>>
> >>>>       To: java-dev@lucene.apache.org <mailto:java-
> >>>>
> >> dev@lucene.apache.org>
> >>
> >>>>       Subject: Re: Why release 3.0?
> >>>>
> >>>>
> >>>>
> >>>>       uwe, on topic please read my comment on LUCENE-1689, because
> >>>> unicode version was bumped in jdk 1.5, i believe this index backwards
> >>>> compatibility is only theoretical
> >>>>
> >>>>       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uwe@thetaphi.de
> >>>>
> >>> <ma...@thetaphi.de>> wrote:
> >>>
> >>>>       2.9 has *not* the same format as 3.0, an index created with 3.0
> >>>> cannot be read with 2.9. This is because compressed field support was
> >>>> removed and therefore the version number of the stored fields file
> was
> >>>> upgraded. But indexes from 2.9 can be read with 3.0 and support may
> >>>>
> >> get
> >>
> >>>> removed in 4.0. 3.0 Indexes can be read until version 4.9.
> >>>>
> >>>>
> >>>>
> >>>>       Uwe
> >>>>
> >>>>       -----
> >>>>       Uwe Schindler
> >>>>       H.-H.-Meier-Allee 63, D-28213 Bremen
> >>>>       http://www.thetaphi.de
> >>>>       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> >>>>
> >>>>
> >>>> ________________________________
> >>>>
> >>>>
> >>>>       From: Jake Mannix [mailto:jake.mannix@gmail.com
> >>>>
> >>> <ma...@gmail.com>]
> >>>
> >>>>       Sent: Monday, November 16, 2009 7:15 PM
> >>>>
> >>>>
> >>>>       To: java-dev@lucene.apache.org <mailto:java-
> >>>>
> >> dev@lucene.apache.org>
> >>
> >>>>       Subject: Re: Why release 3.0?
> >>>>
> >>>>
> >>>>
> >>>>       Don't users need to upgrade to 3.0 because 3.1 won't be
> >>>> necessarily able to read your
> >>>>       2.4 index file formats?  I suppose if you've already upgraded
> to
> >>>> 2.9, then all is well because
> >>>>       2.9 is the same format as 3.0, but we can't assume all users
> >>>> upgraded from 2.4 to 2.9.
> >>>>
> >>>>       If you've done that already, then 3.0 might not be necessary,
> >>>> but if you're on 2.4 right now,
> >>>>       you will be in for a bad surprise if you try to upgrade to 3.1.
> >>>>
> >>>>         -jake
> >>>>
> >>>>       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> >>>> <erickerickson@gmail.com <ma...@gmail.com>> wrote:
> >>>>
> >>>>       One of my "specialties" is asking obvious questions just to see
> >>>> if everyone's assumptions are aligned. So with the discussion about
> >>>> branching 3.0 I have to ask "Is there going to be any 3.0 release
> >>>> intended for *production*?". And if not, would we save a lot of
> >>>> work by just not worrying about retrofitting fixes to a 3.0 branch
> >>>> and carrying on with 3.1 as the first *supported* 3.x release?
> >>>>
> >>>>       Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm
> not
> >>>> sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> >>>> "beta/snapshot" release to get a head start on cleaning up my code
> >>>> does seem worthwhile, if I have the spare time. And having a base
> >>>> 3.0 version that's not changing all over the place would be useful
> >>>> for that.
> >>>>
> >>>>       That said, I'm also not terribly comfortable with a "release"
> >>>> that's out there and unsupported.
> >>>>
> >>>>       Apologies if this has already been discussed, but I don't
> >>>> remember it. Although my memory isn't what it used to be (but
> >>>> some would claim it never was<G>)...
> >>>>
> >>>>       Erick
> >>>>
> >>>
> >>>
> >>> --
> >>> Robert Muir
> >>> rcmuir@gmail.com <ma...@gmail.com>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Robert Muir
> >>> rcmuir@gmail.com <ma...@gmail.com>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Robert Muir
> >>> rcmuir@gmail.com <ma...@gmail.com>
> >>>
> >>>
> >> --
> >> - Mark
> >>
> >> http://www.lucidimagination.com
> >>
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
> 
> 
> --
> - Mark
> 
> http://www.lucidimagination.com
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Why release 3.0?

Posted by Mark Miller <ma...@gmail.com>.

And what happens when someone regenerates it with 1.6 without knowing?

Uwe Schindler wrote:
> I check this by generating the file with 1.4 and 1.5. The 1.4 version will
> not change anymore, so we just leave the java file no jflex anymore. The old
> one is used for Lucene until 2.9, if you use matchVersion=LUCENE_30, the new
> one is used, which can also be regenerated.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>   
>> -----Original Message-----
>> From: Mark Miller [mailto:markrmiller@gmail.com]
>> Sent: Monday, November 16, 2009 9:21 PM
>> To: java-dev@lucene.apache.org
>> Subject: Re: Why release 3.0?
>>
>> Good point - and that likely means the current warning is not working -
>> what can we do to improve it?
>>
>> Perhaps a new text file called jflexregen or something, and it
>> specifically says you must use java 1.5?
>>
>> Uwe Schindler wrote:
>>     
>>> I think the regenerated code in Standard is since years no longer
>>> generated with 1.4 J Most developers use 1.5 or even 1.6. So it
>>> already changed incompatible.
>>>
>>>
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe@thetaphi.de
>>>
>>> ------------------------------------------------------------------------
>>>
>>> *From:* Robert Muir [mailto:rcmuir@gmail.com]
>>> *Sent:* Monday, November 16, 2009 8:52 PM
>>> *To:* java-dev@lucene.apache.org
>>> *Subject:* Re: Why release 3.0?
>>>
>>>
>>>
>>> Uwe, thats probably a good solution I think. just as long as we
>>> document somewhere,
>>> I think there is some warning verbage in StandardTokenizer already
>>> about this.
>>>
>>> NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate
>>>       the tokenizer, remember to use JRE 1.4 to run jflex (before
>>>       Lucene 3.0).  This grammar now uses constructs (eg :digit:,
>>>       :letter:) whose meaning can vary according to the JRE used to
>>>       run jflex.  See
>>>       https://issues.apache.org/jira/browse/LUCENE-1126 for details.
>>>
>>> On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler <uwe@thetaphi.de
>>> <ma...@thetaphi.de>> wrote:
>>>
>>> But it is a general warning that should be placed in the Wiki: If you
>>> upgrade from Java 1.4 to Java 5, think about reindexing.
>>>
>>>
>>>
>>> It has definitely nothing to do with 3.0, because uses could have
>>> changed (and most of them have) before.
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> *From:* Robert Muir [mailto:rcmuir@gmail.com <ma...@gmail.com>]
>>> *Sent:* Monday, November 16, 2009 8:45 PM
>>>
>>>
>>> *To:* java-dev@lucene.apache.org <ma...@lucene.apache.org>
>>> *Subject:* Re: Why release 3.0?
>>>
>>>
>>>
>>> right, my point is its true its nothing to do with Lucene at all,
>>>       
>> really.
>>     
>>> but the reality is we should clarify this to users I think.
>>>
>>> Its especially complex in the current StandardTokenizer, which uses a
>>> mix of hardcoded ranges and properties, can you tell me if you should
>>> reindex for given language X?
>>> I wouldn't want to answer that question right now.
>>>
>>> On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <uwe@thetaphi.de
>>> <ma...@thetaphi.de>> wrote:
>>>
>>> We tried out: Character.getType() for these two chars:
>>>
>>>
>>>
>>> Java 5:
>>> '\u00AD' = 16
>>> '\u06DD' = 16
>>>
>>> Java 1.4:
>>> '\u00AD' = 20
>>> '\u06DD' = 7
>>>
>>>
>>>
>>> The first is the soft hyphen.
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> *From:* Robert Muir [mailto:rcmuir@gmail.com <ma...@gmail.com>]
>>> *Sent:* Monday, November 16, 2009 8:37 PM
>>>
>>>
>>> *To:* java-dev@lucene.apache.org <ma...@lucene.apache.org>
>>> *Subject:* Re: Why release 3.0?
>>>
>>>
>>>
>>> right, its nothing to do with lucene, instead due to property changes,
>>> etc.
>>>
>>> i just think we should inform users on java 1.4/2.9 that if they
>>> upgrade to java 1.5/3.0, they should reindex.
>>>
>>> the reason i say this about properties, is there are some that change
>>> that will affect tokenizers, i give two examples, a hyphen that
>>> changes from punctuation to format (might affect
>>>       
>> SolrWordDelimiterFilter),
>>     
>>> and arabic ayah which changes from NSM to format, which surely affects
>>> ArabicLetterTokenizer.
>>>
>>> On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sarowe@syr.edu
>>> <ma...@syr.edu>> wrote:
>>>
>>> Hi Robert,
>>>
>>> I agree that the Unicode version supported by the JVM, as you say,
>>> really has nothing to do with Lucene.
>>>
>>> The disruption here is users' upgrading from Java 1.4 to 1.5+, not
>>> when they upgrade Lucene.  I'd guess with few exceptions that most
>>> people have been using Lucene with 1.5+ for a couple of years now,
>>>       
>> though.
>>     
>>> But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact
>>> on most Lucene users, assuming that most use Latin-1 exclusively;
>>> although I haven't looked, I'd be surprised if Latin-1 characters
>>> changed much, if at all, from Unicode 3.0 to 4.0.
>>>
>>> It would be useful, I think, to include (a pointer to?) a description
>>> of the details of the Unicode 3.0->4.0 differences in the Lucene 3.0
>>> release notes, since the minimum required Java version, and so also
>>> the supported Unicode version, changes then.
>>>
>>> Steve
>>>
>>>
>>> On 11/16/2009 at 2:15 PM, Robert Muir wrote:
>>>       
>>>> the problem is that the properties have changed for various
>>>>         
>> characters,
>>     
>>>> and new characters were added.
>>>>
>>>> it really has nothing to do with lucene, but the idea you can go from
>>>> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not
>>>>         
>> true.
>>     
>>>> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uwe@thetaphi.de
>>>>         
>>> <ma...@thetaphi.de>> wrote:
>>>       
>>>>       But an UTF-8 stream from Java 4 can still be read with Java 5,
>>>> what is the problem? Java 5 extended Unicode support, but an index
>>>> created with older versions can still be read. UTF-8 is standardized.
>>>>
>>>>
>>>>
>>>>       -----
>>>>       Uwe Schindler
>>>>       H.-H.-Meier-Allee 63, D-28213 Bremen
>>>>       http://www.thetaphi.de
>>>>       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>>>>
>>>>
>>>> ________________________________
>>>>
>>>>
>>>>       From: Robert Muir [mailto:rcmuir@gmail.com
>>>>         
>>> <ma...@gmail.com>]
>>>       
>>>>       Sent: Monday, November 16, 2009 8:09 PM
>>>>
>>>>       To: java-dev@lucene.apache.org <mailto:java-
>>>>         
>> dev@lucene.apache.org>
>>     
>>>>       Subject: Re: Why release 3.0?
>>>>
>>>>
>>>>
>>>>       uwe, on topic please read my comment on LUCENE-1689, because
>>>> unicode version was bumped in jdk 1.5, i believe this index backwards
>>>> compatibility is only theoretical
>>>>
>>>>       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uwe@thetaphi.de
>>>>         
>>> <ma...@thetaphi.de>> wrote:
>>>       
>>>>       2.9 has *not* the same format as 3.0, an index created with 3.0
>>>> cannot be read with 2.9. This is because compressed field support was
>>>> removed and therefore the version number of the stored fields file was
>>>> upgraded. But indexes from 2.9 can be read with 3.0 and support may
>>>>         
>> get
>>     
>>>> removed in 4.0. 3.0 Indexes can be read until version 4.9.
>>>>
>>>>
>>>>
>>>>       Uwe
>>>>
>>>>       -----
>>>>       Uwe Schindler
>>>>       H.-H.-Meier-Allee 63, D-28213 Bremen
>>>>       http://www.thetaphi.de
>>>>       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>>>>
>>>>
>>>> ________________________________
>>>>
>>>>
>>>>       From: Jake Mannix [mailto:jake.mannix@gmail.com
>>>>         
>>> <ma...@gmail.com>]
>>>       
>>>>       Sent: Monday, November 16, 2009 7:15 PM
>>>>
>>>>
>>>>       To: java-dev@lucene.apache.org <mailto:java-
>>>>         
>> dev@lucene.apache.org>
>>     
>>>>       Subject: Re: Why release 3.0?
>>>>
>>>>
>>>>
>>>>       Don't users need to upgrade to 3.0 because 3.1 won't be
>>>> necessarily able to read your
>>>>       2.4 index file formats?  I suppose if you've already upgraded to
>>>> 2.9, then all is well because
>>>>       2.9 is the same format as 3.0, but we can't assume all users
>>>> upgraded from 2.4 to 2.9.
>>>>
>>>>       If you've done that already, then 3.0 might not be necessary,
>>>> but if you're on 2.4 right now,
>>>>       you will be in for a bad surprise if you try to upgrade to 3.1.
>>>>
>>>>         -jake
>>>>
>>>>       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
>>>> <erickerickson@gmail.com <ma...@gmail.com>> wrote:
>>>>
>>>>       One of my "specialties" is asking obvious questions just to see
>>>> if everyone's assumptions are aligned. So with the discussion about
>>>> branching 3.0 I have to ask "Is there going to be any 3.0 release
>>>> intended for *production*?". And if not, would we save a lot of
>>>> work by just not worrying about retrofitting fixes to a 3.0 branch
>>>> and carrying on with 3.1 as the first *supported* 3.x release?
>>>>
>>>>       Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
>>>> sure *as a user* I see a good reason to upgrade to 3.0. Getting a
>>>> "beta/snapshot" release to get a head start on cleaning up my code
>>>> does seem worthwhile, if I have the spare time. And having a base
>>>> 3.0 version that's not changing all over the place would be useful
>>>> for that.
>>>>
>>>>       That said, I'm also not terribly comfortable with a "release"
>>>> that's out there and unsupported.
>>>>
>>>>       Apologies if this has already been discussed, but I don't
>>>> remember it. Although my memory isn't what it used to be (but
>>>> some would claim it never was<G>)...
>>>>
>>>>       Erick
>>>>         
>>>
>>>
>>> --
>>> Robert Muir
>>> rcmuir@gmail.com <ma...@gmail.com>
>>>
>>>
>>>
>>>
>>> --
>>> Robert Muir
>>> rcmuir@gmail.com <ma...@gmail.com>
>>>
>>>
>>>
>>>
>>> --
>>> Robert Muir
>>> rcmuir@gmail.com <ma...@gmail.com>
>>>
>>>       
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>     
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

RE: Why release 3.0?

Posted by Uwe Schindler <uw...@thetaphi.de>.

I check this by generating the file with 1.4 and 1.5. The 1.4 version will
not change anymore, so we just leave the java file no jflex anymore. The old
one is used for Lucene until 2.9, if you use matchVersion=LUCENE_30, the new
one is used, which can also be regenerated.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@gmail.com]
> Sent: Monday, November 16, 2009 9:21 PM
> To: java-dev@lucene.apache.org
> Subject: Re: Why release 3.0?
> 
> Good point - and that likely means the current warning is not working -
> what can we do to improve it?
> 
> Perhaps a new text file called jflexregen or something, and it
> specifically says you must use java 1.5?
> 
> Uwe Schindler wrote:
> >
> > I think the regenerated code in Standard is since years no longer
> > generated with 1.4 J Most developers use 1.5 or even 1.6. So it
> > already changed incompatible.
> >
> >
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> > ------------------------------------------------------------------------
> >
> > *From:* Robert Muir [mailto:rcmuir@gmail.com]
> > *Sent:* Monday, November 16, 2009 8:52 PM
> > *To:* java-dev@lucene.apache.org
> > *Subject:* Re: Why release 3.0?
> >
> >
> >
> > Uwe, thats probably a good solution I think. just as long as we
> > document somewhere,
> > I think there is some warning verbage in StandardTokenizer already
> > about this.
> >
> > NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate
> >       the tokenizer, remember to use JRE 1.4 to run jflex (before
> >       Lucene 3.0).  This grammar now uses constructs (eg :digit:,
> >       :letter:) whose meaning can vary according to the JRE used to
> >       run jflex.  See
> >       https://issues.apache.org/jira/browse/LUCENE-1126 for details.
> >
> > On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler <uwe@thetaphi.de
> > <ma...@thetaphi.de>> wrote:
> >
> > But it is a general warning that should be placed in the Wiki: If you
> > upgrade from Java 1.4 to Java 5, think about reindexing.
> >
> >
> >
> > It has definitely nothing to do with 3.0, because uses could have
> > changed (and most of them have) before.
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> >
> > ------------------------------------------------------------------------
> >
> > *From:* Robert Muir [mailto:rcmuir@gmail.com <ma...@gmail.com>]
> > *Sent:* Monday, November 16, 2009 8:45 PM
> >
> >
> > *To:* java-dev@lucene.apache.org <ma...@lucene.apache.org>
> > *Subject:* Re: Why release 3.0?
> >
> >
> >
> > right, my point is its true its nothing to do with Lucene at all,
> really.
> >
> > but the reality is we should clarify this to users I think.
> >
> > Its especially complex in the current StandardTokenizer, which uses a
> > mix of hardcoded ranges and properties, can you tell me if you should
> > reindex for given language X?
> > I wouldn't want to answer that question right now.
> >
> > On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <uwe@thetaphi.de
> > <ma...@thetaphi.de>> wrote:
> >
> > We tried out: Character.getType() for these two chars:
> >
> >
> >
> > Java 5:
> > '\u00AD' = 16
> > '\u06DD' = 16
> >
> > Java 1.4:
> > '\u00AD' = 20
> > '\u06DD' = 7
> >
> >
> >
> > The first is the soft hyphen.
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> >
> > ------------------------------------------------------------------------
> >
> > *From:* Robert Muir [mailto:rcmuir@gmail.com <ma...@gmail.com>]
> > *Sent:* Monday, November 16, 2009 8:37 PM
> >
> >
> > *To:* java-dev@lucene.apache.org <ma...@lucene.apache.org>
> > *Subject:* Re: Why release 3.0?
> >
> >
> >
> > right, its nothing to do with lucene, instead due to property changes,
> > etc.
> >
> > i just think we should inform users on java 1.4/2.9 that if they
> > upgrade to java 1.5/3.0, they should reindex.
> >
> > the reason i say this about properties, is there are some that change
> > that will affect tokenizers, i give two examples, a hyphen that
> > changes from punctuation to format (might affect
> SolrWordDelimiterFilter),
> > and arabic ayah which changes from NSM to format, which surely affects
> > ArabicLetterTokenizer.
> >
> > On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sarowe@syr.edu
> > <ma...@syr.edu>> wrote:
> >
> > Hi Robert,
> >
> > I agree that the Unicode version supported by the JVM, as you say,
> > really has nothing to do with Lucene.
> >
> > The disruption here is users' upgrading from Java 1.4 to 1.5+, not
> > when they upgrade Lucene.  I'd guess with few exceptions that most
> > people have been using Lucene with 1.5+ for a couple of years now,
> though.
> >
> > But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact
> > on most Lucene users, assuming that most use Latin-1 exclusively;
> > although I haven't looked, I'd be surprised if Latin-1 characters
> > changed much, if at all, from Unicode 3.0 to 4.0.
> >
> > It would be useful, I think, to include (a pointer to?) a description
> > of the details of the Unicode 3.0->4.0 differences in the Lucene 3.0
> > release notes, since the minimum required Java version, and so also
> > the supported Unicode version, changes then.
> >
> > Steve
> >
> >
> > On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> > > the problem is that the properties have changed for various
> characters,
> > > and new characters were added.
> > >
> > > it really has nothing to do with lucene, but the idea you can go from
> > > jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not
> true.
> > >
> > >
> > > On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uwe@thetaphi.de
> > <ma...@thetaphi.de>> wrote:
> > >
> > >
> > >       But an UTF-8 stream from Java 4 can still be read with Java 5,
> > > what is the problem? Java 5 extended Unicode support, but an index
> > > created with older versions can still be read. UTF-8 is standardized.
> > >
> > >
> > >
> > >       -----
> > >       Uwe Schindler
> > >       H.-H.-Meier-Allee 63, D-28213 Bremen
> > >       http://www.thetaphi.de
> > >       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> > >
> > >
> > > ________________________________
> > >
> > >
> > >       From: Robert Muir [mailto:rcmuir@gmail.com
> > <ma...@gmail.com>]
> > >       Sent: Monday, November 16, 2009 8:09 PM
> > >
> > >       To: java-dev@lucene.apache.org <mailto:java-
> dev@lucene.apache.org>
> > >       Subject: Re: Why release 3.0?
> > >
> > >
> > >
> > >       uwe, on topic please read my comment on LUCENE-1689, because
> > > unicode version was bumped in jdk 1.5, i believe this index backwards
> > > compatibility is only theoretical
> > >
> > >       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uwe@thetaphi.de
> > <ma...@thetaphi.de>> wrote:
> > >
> > >       2.9 has *not* the same format as 3.0, an index created with 3.0
> > > cannot be read with 2.9. This is because compressed field support was
> > > removed and therefore the version number of the stored fields file was
> > > upgraded. But indexes from 2.9 can be read with 3.0 and support may
> get
> > > removed in 4.0. 3.0 Indexes can be read until version 4.9.
> > >
> > >
> > >
> > >       Uwe
> > >
> > >       -----
> > >       Uwe Schindler
> > >       H.-H.-Meier-Allee 63, D-28213 Bremen
> > >       http://www.thetaphi.de
> > >       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> > >
> > >
> > > ________________________________
> > >
> > >
> > >       From: Jake Mannix [mailto:jake.mannix@gmail.com
> > <ma...@gmail.com>]
> > >       Sent: Monday, November 16, 2009 7:15 PM
> > >
> > >
> > >       To: java-dev@lucene.apache.org <mailto:java-
> dev@lucene.apache.org>
> > >
> > >       Subject: Re: Why release 3.0?
> > >
> > >
> > >
> > >       Don't users need to upgrade to 3.0 because 3.1 won't be
> > > necessarily able to read your
> > >       2.4 index file formats?  I suppose if you've already upgraded to
> > > 2.9, then all is well because
> > >       2.9 is the same format as 3.0, but we can't assume all users
> > > upgraded from 2.4 to 2.9.
> > >
> > >       If you've done that already, then 3.0 might not be necessary,
> > > but if you're on 2.4 right now,
> > >       you will be in for a bad surprise if you try to upgrade to 3.1.
> > >
> > >         -jake
> > >
> > >       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> > > <erickerickson@gmail.com <ma...@gmail.com>> wrote:
> > >
> > >       One of my "specialties" is asking obvious questions just to see
> > > if everyone's assumptions are aligned. So with the discussion about
> > > branching 3.0 I have to ask "Is there going to be any 3.0 release
> > > intended for *production*?". And if not, would we save a lot of
> > > work by just not worrying about retrofitting fixes to a 3.0 branch
> > > and carrying on with 3.1 as the first *supported* 3.x release?
> > >
> > >       Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> > > sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> > > "beta/snapshot" release to get a head start on cleaning up my code
> > > does seem worthwhile, if I have the spare time. And having a base
> > > 3.0 version that's not changing all over the place would be useful
> > > for that.
> > >
> > >       That said, I'm also not terribly comfortable with a "release"
> > > that's out there and unsupported.
> > >
> > >       Apologies if this has already been discussed, but I don't
> > > remember it. Although my memory isn't what it used to be (but
> > > some would claim it never was<G>)...
> > >
> > >       Erick
> >
> >
> >
> >
> > --
> > Robert Muir
> > rcmuir@gmail.com <ma...@gmail.com>
> >
> >
> >
> >
> > --
> > Robert Muir
> > rcmuir@gmail.com <ma...@gmail.com>
> >
> >
> >
> >
> > --
> > Robert Muir
> > rcmuir@gmail.com <ma...@gmail.com>
> >
> 
> 
> --
> - Mark
> 
> http://www.lucidimagination.com
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: Why release 3.0?

Posted by Mark Miller <ma...@gmail.com>.

Good point - and that likely means the current warning is not working -
what can we do to improve it?

Perhaps a new text file called jflexregen or something, and it
specifically says you must use java 1.5?

Uwe Schindler wrote:
>
> I think the regenerated code in Standard is since years no longer
> generated with 1.4 J Most developers use 1.5 or even 1.6. So it
> already changed incompatible…
>
>  
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> ------------------------------------------------------------------------
>
> *From:* Robert Muir [mailto:rcmuir@gmail.com]
> *Sent:* Monday, November 16, 2009 8:52 PM
> *To:* java-dev@lucene.apache.org
> *Subject:* Re: Why release 3.0?
>
>  
>
> Uwe, thats probably a good solution I think. just as long as we
> document somewhere,
> I think there is some warning verbage in StandardTokenizer already
> about this.
>
> NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate
>       the tokenizer, remember to use JRE 1.4 to run jflex (before
>       Lucene 3.0).  This grammar now uses constructs (eg :digit:,
>       :letter:) whose meaning can vary according to the JRE used to
>       run jflex.  See
>       https://issues.apache.org/jira/browse/LUCENE-1126 for details.
>
> On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler <uwe@thetaphi.de
> <ma...@thetaphi.de>> wrote:
>
> But it is a general warning that should be placed in the Wiki: If you
> upgrade from Java 1.4 to Java 5, think about reindexing.
>
>  
>
> It has definitely nothing to do with 3.0, because uses could have
> changed (and most of them have) before.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>
> ------------------------------------------------------------------------
>
> *From:* Robert Muir [mailto:rcmuir@gmail.com <ma...@gmail.com>]
> *Sent:* Monday, November 16, 2009 8:45 PM
>
>
> *To:* java-dev@lucene.apache.org <ma...@lucene.apache.org>
> *Subject:* Re: Why release 3.0?
>
>  
>
> right, my point is its true its nothing to do with Lucene at all, really.
>
> but the reality is we should clarify this to users I think.
>
> Its especially complex in the current StandardTokenizer, which uses a
> mix of hardcoded ranges and properties, can you tell me if you should
> reindex for given language X?
> I wouldn't want to answer that question right now.
>
> On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <uwe@thetaphi.de
> <ma...@thetaphi.de>> wrote:
>
> We tried out: Character.getType() for these two chars:
>
>  
>
> Java 5:
> '\u00AD' = 16
> '\u06DD' = 16
>
> Java 1.4:
> '\u00AD' = 20
> '\u06DD' = 7
>
>  
>
> The first is the soft hyphen.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de <ma...@thetaphi.de>
>
> ------------------------------------------------------------------------
>
> *From:* Robert Muir [mailto:rcmuir@gmail.com <ma...@gmail.com>]
> *Sent:* Monday, November 16, 2009 8:37 PM
>
>
> *To:* java-dev@lucene.apache.org <ma...@lucene.apache.org>
> *Subject:* Re: Why release 3.0?
>
>  
>
> right, its nothing to do with lucene, instead due to property changes,
> etc.
>
> i just think we should inform users on java 1.4/2.9 that if they
> upgrade to java 1.5/3.0, they should reindex.
>
> the reason i say this about properties, is there are some that change
> that will affect tokenizers, i give two examples, a hyphen that
> changes from punctuation to format (might affect SolrWordDelimiterFilter),
> and arabic ayah which changes from NSM to format, which surely affects
> ArabicLetterTokenizer.
>
> On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sarowe@syr.edu
> <ma...@syr.edu>> wrote:
>
> Hi Robert,
>
> I agree that the Unicode version supported by the JVM, as you say,
> really has nothing to do with Lucene.
>
> The disruption here is users' upgrading from Java 1.4 to 1.5+, not
> when they upgrade Lucene.  I'd guess with few exceptions that most
> people have been using Lucene with 1.5+ for a couple of years now, though.
>
> But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact
> on most Lucene users, assuming that most use Latin-1 exclusively;
> although I haven't looked, I'd be surprised if Latin-1 characters
> changed much, if at all, from Unicode 3.0 to 4.0.
>
> It would be useful, I think, to include (a pointer to?) a description
> of the details of the Unicode 3.0->4.0 differences in the Lucene 3.0
> release notes, since the minimum required Java version, and so also
> the supported Unicode version, changes then.
>
> Steve
>
>
> On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> > the problem is that the properties have changed for various characters,
> > and new characters were added.
> >
> > it really has nothing to do with lucene, but the idea you can go from
> > jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
> >
> >
> > On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uwe@thetaphi.de
> <ma...@thetaphi.de>> wrote:
> >
> >
> >       But an UTF-8 stream from Java 4 can still be read with Java 5,
> > what is the problem? Java 5 extended Unicode support, but an index
> > created with older versions can still be read. UTF-8 is standardized…
> >
> >
> >
> >       -----
> >       Uwe Schindler
> >       H.-H.-Meier-Allee 63, D-28213 Bremen
> >       http://www.thetaphi.de
> >       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> >
> >
> > ________________________________
> >
> >
> >       From: Robert Muir [mailto:rcmuir@gmail.com
> <ma...@gmail.com>]
> >       Sent: Monday, November 16, 2009 8:09 PM
> >
> >       To: java-dev@lucene.apache.org <ma...@lucene.apache.org>
> >       Subject: Re: Why release 3.0?
> >
> >
> >
> >       uwe, on topic please read my comment on LUCENE-1689, because
> > unicode version was bumped in jdk 1.5, i believe this index backwards
> > compatibility is only theoretical
> >
> >       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uwe@thetaphi.de
> <ma...@thetaphi.de>> wrote:
> >
> >       2.9 has *not* the same format as 3.0, an index created with 3.0
> > cannot be read with 2.9. This is because compressed field support was
> > removed and therefore the version number of the stored fields file was
> > upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> > removed in 4.0. 3.0 Indexes can be read until version 4.9.
> >
> >
> >
> >       Uwe
> >
> >       -----
> >       Uwe Schindler
> >       H.-H.-Meier-Allee 63, D-28213 Bremen
> >       http://www.thetaphi.de
> >       eMail: uwe@thetaphi.de <ma...@thetaphi.de>
> >
> >
> > ________________________________
> >
> >
> >       From: Jake Mannix [mailto:jake.mannix@gmail.com
> <ma...@gmail.com>]
> >       Sent: Monday, November 16, 2009 7:15 PM
> >
> >
> >       To: java-dev@lucene.apache.org <ma...@lucene.apache.org>
> >
> >       Subject: Re: Why release 3.0?
> >
> >
> >
> >       Don't users need to upgrade to 3.0 because 3.1 won't be
> > necessarily able to read your
> >       2.4 index file formats?  I suppose if you've already upgraded to
> > 2.9, then all is well because
> >       2.9 is the same format as 3.0, but we can't assume all users
> > upgraded from 2.4 to 2.9.
> >
> >       If you've done that already, then 3.0 might not be necessary,
> > but if you're on 2.4 right now,
> >       you will be in for a bad surprise if you try to upgrade to 3.1.
> >
> >         -jake
> >
> >       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> > <erickerickson@gmail.com <ma...@gmail.com>> wrote:
> >
> >       One of my "specialties" is asking obvious questions just to see
> > if everyone's assumptions are aligned. So with the discussion about
> > branching 3.0 I have to ask "Is there going to be any 3.0 release
> > intended for *production*?". And if not, would we save a lot of
> > work by just not worrying about retrofitting fixes to a 3.0 branch
> > and carrying on with 3.1 as the first *supported* 3.x release?
> >
> >       Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> > sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> > "beta/snapshot" release to get a head start on cleaning up my code
> > does seem worthwhile, if I have the spare time. And having a base
> > 3.0 version that's not changing all over the place would be useful
> > for that.
> >
> >       That said, I'm also not terribly comfortable with a "release"
> > that's out there and unsupported.
> >
> >       Apologies if this has already been discussed, but I don't
> > remember it. Although my memory isn't what it used to be (but
> > some would claim it never was<G>)...
> >
> >       Erick
>
>
>
>
> -- 
> Robert Muir
> rcmuir@gmail.com <ma...@gmail.com>
>
>
>
>
> -- 
> Robert Muir
> rcmuir@gmail.com <ma...@gmail.com>
>
>
>
>
> -- 
> Robert Muir
> rcmuir@gmail.com <ma...@gmail.com>
>


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

RE: Why release 3.0?

Posted by Uwe Schindler <uw...@thetaphi.de>.

I think the regenerated code in Standard is since years no longer generated
with 1.4 :-) Most developers use 1.5 or even 1.6. So it already changed
incompatible.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Robert Muir [mailto:rcmuir@gmail.com] 
Sent: Monday, November 16, 2009 8:52 PM
To: java-dev@lucene.apache.org
Subject: Re: Why release 3.0?

Uwe, thats probably a good solution I think. just as long as we document
somewhere,
I think there is some warning verbage in StandardTokenizer already about
this.

NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate
      the tokenizer, remember to use JRE 1.4 to run jflex (before
      Lucene 3.0).  This grammar now uses constructs (eg :digit:,
      :letter:) whose meaning can vary according to the JRE used to
      run jflex.  See
      https://issues.apache.org/jira/browse/LUCENE-1126 for details.

On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

But it is a general warning that should be placed in the Wiki: If you
upgrade from Java 1.4 to Java 5, think about reindexing.

It has definitely nothing to do with 3.0, because uses could have changed
(and most of them have) before.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Robert Muir [mailto:rcmuir@gmail.com] 
Sent: Monday, November 16, 2009 8:45 PM

To: java-dev@lucene.apache.org
Subject: Re: Why release 3.0?

right, my point is its true its nothing to do with Lucene at all, really.

but the reality is we should clarify this to users I think. 

Its especially complex in the current StandardTokenizer, which uses a mix of
hardcoded ranges and properties, can you tell me if you should reindex for
given language X?
I wouldn't want to answer that question right now.

On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

We tried out: Character.getType() for these two chars:

Java 5:
'\u00AD' = 16
'\u06DD' = 16

Java 1.4:
'\u00AD' = 20
'\u06DD' = 7

The first is the soft hyphen.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Robert Muir [mailto:rcmuir@gmail.com] 
Sent: Monday, November 16, 2009 8:37 PM

To: java-dev@lucene.apache.org
Subject: Re: Why release 3.0?

right, its nothing to do with lucene, instead due to property changes, etc.

i just think we should inform users on java 1.4/2.9 that if they upgrade to
java 1.5/3.0, they should reindex.

the reason i say this about properties, is there are some that change that
will affect tokenizers, i give two examples, a hyphen that changes from
punctuation to format (might affect SolrWordDelimiterFilter),
and arabic ayah which changes from NSM to format, which surely affects
ArabicLetterTokenizer.

On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sa...@syr.edu> wrote:

Hi Robert,

I agree that the Unicode version supported by the JVM, as you say, really
has nothing to do with Lucene.

The disruption here is users' upgrading from Java 1.4 to 1.5+, not when they
upgrade Lucene.  I'd guess with few exceptions that most people have been
using Lucene with 1.5+ for a couple of years now, though.

But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on
most Lucene users, assuming that most use Latin-1 exclusively; although I
haven't looked, I'd be surprised if Latin-1 characters changed much, if at
all, from Unicode 3.0 to 4.0.

It would be useful, I think, to include (a pointer to?) a description of the
details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release notes,
since the minimum required Java version, and so also the supported Unicode
version, changes then.

Steve

On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> the problem is that the properties have changed for various characters,
> and new characters were added.
>
> it really has nothing to do with lucene, but the idea you can go from
> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
>
>
> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
>
>       But an UTF-8 stream from Java 4 can still be read with Java 5,
> what is the problem? Java 5 extended Unicode support, but an index
> created with older versions can still be read. UTF-8 is standardized.
>
>
>
>       -----
>       Uwe Schindler
>       H.-H.-Meier-Allee 63, D-28213 Bremen
>       http://www.thetaphi.de
>       eMail: uwe@thetaphi.de
>
>
> ________________________________
>
>
>       From: Robert Muir [mailto:rcmuir@gmail.com]
>       Sent: Monday, November 16, 2009 8:09 PM
>
>       To: java-dev@lucene.apache.org
>       Subject: Re: Why release 3.0?
>
>
>
>       uwe, on topic please read my comment on LUCENE-1689, because
> unicode version was bumped in jdk 1.5, i believe this index backwards
> compatibility is only theoretical
>
>       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uw...@thetaphi.de>
wrote:
>
>       2.9 has *not* the same format as 3.0, an index created with 3.0
> cannot be read with 2.9. This is because compressed field support was
> removed and therefore the version number of the stored fields file was
> upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> removed in 4.0. 3.0 Indexes can be read until version 4.9.
>
>
>
>       Uwe
>
>       -----
>       Uwe Schindler
>       H.-H.-Meier-Allee 63, D-28213 Bremen
>       http://www.thetaphi.de
>       eMail: uwe@thetaphi.de
>
>
> ________________________________
>
>
>       From: Jake Mannix [mailto:jake.mannix@gmail.com]
>       Sent: Monday, November 16, 2009 7:15 PM
>
>
>       To: java-dev@lucene.apache.org
>
>       Subject: Re: Why release 3.0?
>
>
>
>       Don't users need to upgrade to 3.0 because 3.1 won't be
> necessarily able to read your
>       2.4 index file formats?  I suppose if you've already upgraded to
> 2.9, then all is well because
>       2.9 is the same format as 3.0, but we can't assume all users
> upgraded from 2.4 to 2.9.
>
>       If you've done that already, then 3.0 might not be necessary,
> but if you're on 2.4 right now,
>       you will be in for a bad surprise if you try to upgrade to 3.1.
>
>         -jake
>
>       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> <er...@gmail.com> wrote:
>
>       One of my "specialties" is asking obvious questions just to see
> if everyone's assumptions are aligned. So with the discussion about
> branching 3.0 I have to ask "Is there going to be any 3.0 release
> intended for *production*?". And if not, would we save a lot of
> work by just not worrying about retrofitting fixes to a 3.0 branch
> and carrying on with 3.1 as the first *supported* 3.x release?
>
>       Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> "beta/snapshot" release to get a head start on cleaning up my code
> does seem worthwhile, if I have the spare time. And having a base
> 3.0 version that's not changing all over the place would be useful
> for that.
>
>       That said, I'm also not terribly comfortable with a "release"
> that's out there and unsupported.
>
>       Apologies if this has already been discussed, but I don't
> remember it. Although my memory isn't what it used to be (but
> some would claim it never was<G>)...
>
>       Erick

-- 
Robert Muir
rcmuir@gmail.com

-- 
Robert Muir
rcmuir@gmail.com

-- 
Robert Muir
rcmuir@gmail.com

Re: Why release 3.0?

Posted by Robert Muir <rc...@gmail.com>.

Uwe, thats probably a good solution I think. just as long as we document
somewhere,
I think there is some warning verbage in StandardTokenizer already about
this.

NOTE: if you change StandardTokenizerImpl.jflex and need to regenerate
      the tokenizer, remember to use JRE 1.4 to run jflex (before
      Lucene 3.0).  This grammar now uses constructs (eg :digit:,
      :letter:) whose meaning can vary according to the JRE used to
      run jflex.  See
      https://issues.apache.org/jira/browse/LUCENE-1126 for details.

On Mon, Nov 16, 2009 at 2:50 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

>  But it is a general warning that should be placed in the Wiki: If you
> upgrade from Java 1.4 to Java 5, think about reindexing.
>
>
>
> It has definitely nothing to do with 3.0, because uses could have changed
> (and most of them have) before.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>   ------------------------------
>
> *From:* Robert Muir [mailto:rcmuir@gmail.com]
> *Sent:* Monday, November 16, 2009 8:45 PM
>
> *To:* java-dev@lucene.apache.org
> *Subject:* Re: Why release 3.0?
>
>
>
> right, my point is its true its nothing to do with Lucene at all, really.
>
> but the reality is we should clarify this to users I think.
>
> Its especially complex in the current StandardTokenizer, which uses a mix
> of hardcoded ranges and properties, can you tell me if you should reindex
> for given language X?
> I wouldn't want to answer that question right now.
>
> On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
> We tried out: Character.getType() for these two chars:
>
>
>
> Java 5:
> '\u00AD' = 16
> '\u06DD' = 16
>
> Java 1.4:
> '\u00AD' = 20
> '\u06DD' = 7
>
>
>
> The first is the soft hyphen.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>    ------------------------------
>
> *From:* Robert Muir [mailto:rcmuir@gmail.com]
> *Sent:* Monday, November 16, 2009 8:37 PM
>
>
> *To:* java-dev@lucene.apache.org
> *Subject:* Re: Why release 3.0?
>
>
>
> right, its nothing to do with lucene, instead due to property changes, etc.
>
> i just think we should inform users on java 1.4/2.9 that if they upgrade to
> java 1.5/3.0, they should reindex.
>
> the reason i say this about properties, is there are some that change that
> will affect tokenizers, i give two examples, a hyphen that changes from
> punctuation to format (might affect SolrWordDelimiterFilter),
> and arabic ayah which changes from NSM to format, which surely affects
> ArabicLetterTokenizer.
>
> On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sa...@syr.edu> wrote:
>
> Hi Robert,
>
> I agree that the Unicode version supported by the JVM, as you say, really
> has nothing to do with Lucene.
>
> The disruption here is users' upgrading from Java 1.4 to 1.5+, not when
> they upgrade Lucene.  I'd guess with few exceptions that most people have
> been using Lucene with 1.5+ for a couple of years now, though.
>
> But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on
> most Lucene users, assuming that most use Latin-1 exclusively; although I
> haven't looked, I'd be surprised if Latin-1 characters changed much, if at
> all, from Unicode 3.0 to 4.0.
>
> It would be useful, I think, to include (a pointer to?) a description of
> the details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release
> notes, since the minimum required Java version, and so also the supported
> Unicode version, changes then.
>
> Steve
>
>
> On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> > the problem is that the properties have changed for various characters,
> > and new characters were added.
> >
> > it really has nothing to do with lucene, but the idea you can go from
> > jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
> >
> >
> > On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> >
> >
> >       But an UTF-8 stream from Java 4 can still be read with Java 5,
> > what is the problem? Java 5 extended Unicode support, but an index
> > created with older versions can still be read. UTF-8 is standardized…
> >
> >
> >
> >       -----
> >       Uwe Schindler
> >       H.-H.-Meier-Allee 63, D-28213 Bremen
> >       http://www.thetaphi.de
> >       eMail: uwe@thetaphi.de
> >
> >
> > ________________________________
> >
> >
> >       From: Robert Muir [mailto:rcmuir@gmail.com]
> >       Sent: Monday, November 16, 2009 8:09 PM
> >
> >       To: java-dev@lucene.apache.org
> >       Subject: Re: Why release 3.0?
> >
> >
> >
> >       uwe, on topic please read my comment on LUCENE-1689, because
> > unicode version was bumped in jdk 1.5, i believe this index backwards
> > compatibility is only theoretical
> >
> >       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uw...@thetaphi.de>
> wrote:
> >
> >       2.9 has *not* the same format as 3.0, an index created with 3.0
> > cannot be read with 2.9. This is because compressed field support was
> > removed and therefore the version number of the stored fields file was
> > upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> > removed in 4.0. 3.0 Indexes can be read until version 4.9.
> >
> >
> >
> >       Uwe
> >
> >       -----
> >       Uwe Schindler
> >       H.-H.-Meier-Allee 63, D-28213 Bremen
> >       http://www.thetaphi.de
> >       eMail: uwe@thetaphi.de
> >
> >
> > ________________________________
> >
> >
> >       From: Jake Mannix [mailto:jake.mannix@gmail.com]
> >       Sent: Monday, November 16, 2009 7:15 PM
> >
> >
> >       To: java-dev@lucene.apache.org
> >
> >       Subject: Re: Why release 3.0?
> >
> >
> >
> >       Don't users need to upgrade to 3.0 because 3.1 won't be
> > necessarily able to read your
> >       2.4 index file formats?  I suppose if you've already upgraded to
> > 2.9, then all is well because
> >       2.9 is the same format as 3.0, but we can't assume all users
> > upgraded from 2.4 to 2.9.
> >
> >       If you've done that already, then 3.0 might not be necessary,
> > but if you're on 2.4 right now,
> >       you will be in for a bad surprise if you try to upgrade to 3.1.
> >
> >         -jake
> >
> >       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> > <er...@gmail.com> wrote:
> >
> >       One of my "specialties" is asking obvious questions just to see
> > if everyone's assumptions are aligned. So with the discussion about
> > branching 3.0 I have to ask "Is there going to be any 3.0 release
> > intended for *production*?". And if not, would we save a lot of
> > work by just not worrying about retrofitting fixes to a 3.0 branch
> > and carrying on with 3.1 as the first *supported* 3.x release?
> >
> >       Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> > sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> > "beta/snapshot" release to get a head start on cleaning up my code
> > does seem worthwhile, if I have the spare time. And having a base
> > 3.0 version that's not changing all over the place would be useful
> > for that.
> >
> >       That said, I'm also not terribly comfortable with a "release"
> > that's out there and unsupported.
> >
> >       Apologies if this has already been discussed, but I don't
> > remember it. Although my memory isn't what it used to be (but
> > some would claim it never was<G>)...
> >
> >       Erick
>
>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>
>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>



-- 
Robert Muir
rcmuir@gmail.com

RE: Why release 3.0?

Posted by Uwe Schindler <uw...@thetaphi.de>.

But it is a general warning that should be placed in the Wiki: If you
upgrade from Java 1.4 to Java 5, think about reindexing.

It has definitely nothing to do with 3.0, because uses could have changed
(and most of them have) before.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Robert Muir [mailto:rcmuir@gmail.com] 
Sent: Monday, November 16, 2009 8:45 PM
To: java-dev@lucene.apache.org
Subject: Re: Why release 3.0?

right, my point is its true its nothing to do with Lucene at all, really.

but the reality is we should clarify this to users I think. 

Its especially complex in the current StandardTokenizer, which uses a mix of
hardcoded ranges and properties, can you tell me if you should reindex for
given language X?
I wouldn't want to answer that question right now.

On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

We tried out: Character.getType() for these two chars:

Java 5:
'\u00AD' = 16
'\u06DD' = 16

Java 1.4:
'\u00AD' = 20
'\u06DD' = 7

The first is the soft hyphen.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Robert Muir [mailto:rcmuir@gmail.com] 
Sent: Monday, November 16, 2009 8:37 PM

To: java-dev@lucene.apache.org
Subject: Re: Why release 3.0?

right, its nothing to do with lucene, instead due to property changes, etc.

i just think we should inform users on java 1.4/2.9 that if they upgrade to
java 1.5/3.0, they should reindex.

the reason i say this about properties, is there are some that change that
will affect tokenizers, i give two examples, a hyphen that changes from
punctuation to format (might affect SolrWordDelimiterFilter),
and arabic ayah which changes from NSM to format, which surely affects
ArabicLetterTokenizer.

On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sa...@syr.edu> wrote:

Hi Robert,

I agree that the Unicode version supported by the JVM, as you say, really
has nothing to do with Lucene.

The disruption here is users' upgrading from Java 1.4 to 1.5+, not when they
upgrade Lucene.  I'd guess with few exceptions that most people have been
using Lucene with 1.5+ for a couple of years now, though.

But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on
most Lucene users, assuming that most use Latin-1 exclusively; although I
haven't looked, I'd be surprised if Latin-1 characters changed much, if at
all, from Unicode 3.0 to 4.0.

It would be useful, I think, to include (a pointer to?) a description of the
details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release notes,
since the minimum required Java version, and so also the supported Unicode
version, changes then.

Steve

On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> the problem is that the properties have changed for various characters,
> and new characters were added.
>
> it really has nothing to do with lucene, but the idea you can go from
> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
>
>
> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
>
>       But an UTF-8 stream from Java 4 can still be read with Java 5,
> what is the problem? Java 5 extended Unicode support, but an index
> created with older versions can still be read. UTF-8 is standardized.
>
>
>
>       -----
>       Uwe Schindler
>       H.-H.-Meier-Allee 63, D-28213 Bremen
>       http://www.thetaphi.de
>       eMail: uwe@thetaphi.de
>
>
> ________________________________
>
>
>       From: Robert Muir [mailto:rcmuir@gmail.com]
>       Sent: Monday, November 16, 2009 8:09 PM
>
>       To: java-dev@lucene.apache.org
>       Subject: Re: Why release 3.0?
>
>
>
>       uwe, on topic please read my comment on LUCENE-1689, because
> unicode version was bumped in jdk 1.5, i believe this index backwards
> compatibility is only theoretical
>
>       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uw...@thetaphi.de>
wrote:
>
>       2.9 has *not* the same format as 3.0, an index created with 3.0
> cannot be read with 2.9. This is because compressed field support was
> removed and therefore the version number of the stored fields file was
> upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> removed in 4.0. 3.0 Indexes can be read until version 4.9.
>
>
>
>       Uwe
>
>       -----
>       Uwe Schindler
>       H.-H.-Meier-Allee 63, D-28213 Bremen
>       http://www.thetaphi.de
>       eMail: uwe@thetaphi.de
>
>
> ________________________________
>
>
>       From: Jake Mannix [mailto:jake.mannix@gmail.com]
>       Sent: Monday, November 16, 2009 7:15 PM
>
>
>       To: java-dev@lucene.apache.org
>
>       Subject: Re: Why release 3.0?
>
>
>
>       Don't users need to upgrade to 3.0 because 3.1 won't be
> necessarily able to read your
>       2.4 index file formats?  I suppose if you've already upgraded to
> 2.9, then all is well because
>       2.9 is the same format as 3.0, but we can't assume all users
> upgraded from 2.4 to 2.9.
>
>       If you've done that already, then 3.0 might not be necessary,
> but if you're on 2.4 right now,
>       you will be in for a bad surprise if you try to upgrade to 3.1.
>
>         -jake
>
>       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> <er...@gmail.com> wrote:
>
>       One of my "specialties" is asking obvious questions just to see
> if everyone's assumptions are aligned. So with the discussion about
> branching 3.0 I have to ask "Is there going to be any 3.0 release
> intended for *production*?". And if not, would we save a lot of
> work by just not worrying about retrofitting fixes to a 3.0 branch
> and carrying on with 3.1 as the first *supported* 3.x release?
>
>       Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> "beta/snapshot" release to get a head start on cleaning up my code
> does seem worthwhile, if I have the spare time. And having a base
> 3.0 version that's not changing all over the place would be useful
> for that.
>
>       That said, I'm also not terribly comfortable with a "release"
> that's out there and unsupported.
>
>       Apologies if this has already been discussed, but I don't
> remember it. Although my memory isn't what it used to be (but
> some would claim it never was<G>)...
>
>       Erick

-- 
Robert Muir
rcmuir@gmail.com

-- 
Robert Muir
rcmuir@gmail.com

Re: Why release 3.0?

Posted by Robert Muir <rc...@gmail.com>.

right, my point is its true its nothing to do with Lucene at all, really.

but the reality is we should clarify this to users I think.

Its especially complex in the current StandardTokenizer, which uses a mix of
hardcoded ranges and properties, can you tell me if you should reindex for
given language X?
I wouldn't want to answer that question right now.

On Mon, Nov 16, 2009 at 2:42 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

>  We tried out: Character.getType() for these two chars:
>
>
>
> Java 5:
> '\u00AD' = 16
> '\u06DD' = 16
>
> Java 1.4:
> '\u00AD' = 20
> '\u06DD' = 7
>
>
>
> The first is the soft hyphen.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>   ------------------------------
>
> *From:* Robert Muir [mailto:rcmuir@gmail.com]
> *Sent:* Monday, November 16, 2009 8:37 PM
>
> *To:* java-dev@lucene.apache.org
> *Subject:* Re: Why release 3.0?
>
>
>
> right, its nothing to do with lucene, instead due to property changes, etc.
>
> i just think we should inform users on java 1.4/2.9 that if they upgrade to
> java 1.5/3.0, they should reindex.
>
> the reason i say this about properties, is there are some that change that
> will affect tokenizers, i give two examples, a hyphen that changes from
> punctuation to format (might affect SolrWordDelimiterFilter),
> and arabic ayah which changes from NSM to format, which surely affects
> ArabicLetterTokenizer.
>
> On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sa...@syr.edu> wrote:
>
> Hi Robert,
>
> I agree that the Unicode version supported by the JVM, as you say, really
> has nothing to do with Lucene.
>
> The disruption here is users' upgrading from Java 1.4 to 1.5+, not when
> they upgrade Lucene.  I'd guess with few exceptions that most people have
> been using Lucene with 1.5+ for a couple of years now, though.
>
> But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on
> most Lucene users, assuming that most use Latin-1 exclusively; although I
> haven't looked, I'd be surprised if Latin-1 characters changed much, if at
> all, from Unicode 3.0 to 4.0.
>
> It would be useful, I think, to include (a pointer to?) a description of
> the details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release
> notes, since the minimum required Java version, and so also the supported
> Unicode version, changes then.
>
> Steve
>
>
> On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> > the problem is that the properties have changed for various characters,
> > and new characters were added.
> >
> > it really has nothing to do with lucene, but the idea you can go from
> > jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
> >
> >
> > On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> >
> >
> >       But an UTF-8 stream from Java 4 can still be read with Java 5,
> > what is the problem? Java 5 extended Unicode support, but an index
> > created with older versions can still be read. UTF-8 is standardized…
> >
> >
> >
> >       -----
> >       Uwe Schindler
> >       H.-H.-Meier-Allee 63, D-28213 Bremen
> >       http://www.thetaphi.de
> >       eMail: uwe@thetaphi.de
> >
> >
> > ________________________________
> >
> >
> >       From: Robert Muir [mailto:rcmuir@gmail.com]
> >       Sent: Monday, November 16, 2009 8:09 PM
> >
> >       To: java-dev@lucene.apache.org
> >       Subject: Re: Why release 3.0?
> >
> >
> >
> >       uwe, on topic please read my comment on LUCENE-1689, because
> > unicode version was bumped in jdk 1.5, i believe this index backwards
> > compatibility is only theoretical
> >
> >       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uw...@thetaphi.de>
> wrote:
> >
> >       2.9 has *not* the same format as 3.0, an index created with 3.0
> > cannot be read with 2.9. This is because compressed field support was
> > removed and therefore the version number of the stored fields file was
> > upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> > removed in 4.0. 3.0 Indexes can be read until version 4.9.
> >
> >
> >
> >       Uwe
> >
> >       -----
> >       Uwe Schindler
> >       H.-H.-Meier-Allee 63, D-28213 Bremen
> >       http://www.thetaphi.de
> >       eMail: uwe@thetaphi.de
> >
> >
> > ________________________________
> >
> >
> >       From: Jake Mannix [mailto:jake.mannix@gmail.com]
> >       Sent: Monday, November 16, 2009 7:15 PM
> >
> >
> >       To: java-dev@lucene.apache.org
> >
> >       Subject: Re: Why release 3.0?
> >
> >
> >
> >       Don't users need to upgrade to 3.0 because 3.1 won't be
> > necessarily able to read your
> >       2.4 index file formats?  I suppose if you've already upgraded to
> > 2.9, then all is well because
> >       2.9 is the same format as 3.0, but we can't assume all users
> > upgraded from 2.4 to 2.9.
> >
> >       If you've done that already, then 3.0 might not be necessary,
> > but if you're on 2.4 right now,
> >       you will be in for a bad surprise if you try to upgrade to 3.1.
> >
> >         -jake
> >
> >       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> > <er...@gmail.com> wrote:
> >
> >       One of my "specialties" is asking obvious questions just to see
> > if everyone's assumptions are aligned. So with the discussion about
> > branching 3.0 I have to ask "Is there going to be any 3.0 release
> > intended for *production*?". And if not, would we save a lot of
> > work by just not worrying about retrofitting fixes to a 3.0 branch
> > and carrying on with 3.1 as the first *supported* 3.x release?
> >
> >       Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> > sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> > "beta/snapshot" release to get a head start on cleaning up my code
> > does seem worthwhile, if I have the spare time. And having a base
> > 3.0 version that's not changing all over the place would be useful
> > for that.
> >
> >       That said, I'm also not terribly comfortable with a "release"
> > that's out there and unsupported.
> >
> >       Apologies if this has already been discussed, but I don't
> > remember it. Although my memory isn't what it used to be (but
> > some would claim it never was<G>)...
> >
> >       Erick
>
>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>



-- 
Robert Muir
rcmuir@gmail.com

RE: Why release 3.0?

Posted by Uwe Schindler <uw...@thetaphi.de>.

We tried out: Character.getType() for these two chars:

 

Java 5:
'\u00AD' = 16
'\u06DD' = 16

Java 1.4:
'\u00AD' = 20
'\u06DD' = 7

 

The first is the soft hyphen.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Robert Muir [mailto:rcmuir@gmail.com] 
Sent: Monday, November 16, 2009 8:37 PM
To: java-dev@lucene.apache.org
Subject: Re: Why release 3.0?

 

right, its nothing to do with lucene, instead due to property changes, etc.

i just think we should inform users on java 1.4/2.9 that if they upgrade to
java 1.5/3.0, they should reindex.

the reason i say this about properties, is there are some that change that
will affect tokenizers, i give two examples, a hyphen that changes from
punctuation to format (might affect SolrWordDelimiterFilter),
and arabic ayah which changes from NSM to format, which surely affects
ArabicLetterTokenizer.

On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sa...@syr.edu> wrote:

Hi Robert,

I agree that the Unicode version supported by the JVM, as you say, really
has nothing to do with Lucene.

The disruption here is users' upgrading from Java 1.4 to 1.5+, not when they
upgrade Lucene.  I'd guess with few exceptions that most people have been
using Lucene with 1.5+ for a couple of years now, though.

But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on
most Lucene users, assuming that most use Latin-1 exclusively; although I
haven't looked, I'd be surprised if Latin-1 characters changed much, if at
all, from Unicode 3.0 to 4.0.

It would be useful, I think, to include (a pointer to?) a description of the
details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release notes,
since the minimum required Java version, and so also the supported Unicode
version, changes then.

Steve


On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> the problem is that the properties have changed for various characters,
> and new characters were added.
>
> it really has nothing to do with lucene, but the idea you can go from
> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
>
>
> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
>
>       But an UTF-8 stream from Java 4 can still be read with Java 5,
> what is the problem? Java 5 extended Unicode support, but an index
> created with older versions can still be read. UTF-8 is standardized.
>
>
>
>       -----
>       Uwe Schindler
>       H.-H.-Meier-Allee 63, D-28213 Bremen
>       http://www.thetaphi.de
>       eMail: uwe@thetaphi.de
>
>
> ________________________________
>
>
>       From: Robert Muir [mailto:rcmuir@gmail.com]
>       Sent: Monday, November 16, 2009 8:09 PM
>
>       To: java-dev@lucene.apache.org
>       Subject: Re: Why release 3.0?
>
>
>
>       uwe, on topic please read my comment on LUCENE-1689, because
> unicode version was bumped in jdk 1.5, i believe this index backwards
> compatibility is only theoretical
>
>       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uw...@thetaphi.de>
wrote:
>
>       2.9 has *not* the same format as 3.0, an index created with 3.0
> cannot be read with 2.9. This is because compressed field support was
> removed and therefore the version number of the stored fields file was
> upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> removed in 4.0. 3.0 Indexes can be read until version 4.9.
>
>
>
>       Uwe
>
>       -----
>       Uwe Schindler
>       H.-H.-Meier-Allee 63, D-28213 Bremen
>       http://www.thetaphi.de
>       eMail: uwe@thetaphi.de
>
>
> ________________________________
>
>
>       From: Jake Mannix [mailto:jake.mannix@gmail.com]
>       Sent: Monday, November 16, 2009 7:15 PM
>
>
>       To: java-dev@lucene.apache.org
>
>       Subject: Re: Why release 3.0?
>
>
>
>       Don't users need to upgrade to 3.0 because 3.1 won't be
> necessarily able to read your
>       2.4 index file formats?  I suppose if you've already upgraded to
> 2.9, then all is well because
>       2.9 is the same format as 3.0, but we can't assume all users
> upgraded from 2.4 to 2.9.
>
>       If you've done that already, then 3.0 might not be necessary,
> but if you're on 2.4 right now,
>       you will be in for a bad surprise if you try to upgrade to 3.1.
>
>         -jake
>
>       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> <er...@gmail.com> wrote:
>
>       One of my "specialties" is asking obvious questions just to see
> if everyone's assumptions are aligned. So with the discussion about
> branching 3.0 I have to ask "Is there going to be any 3.0 release
> intended for *production*?". And if not, would we save a lot of
> work by just not worrying about retrofitting fixes to a 3.0 branch
> and carrying on with 3.1 as the first *supported* 3.x release?
>
>       Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> "beta/snapshot" release to get a head start on cleaning up my code
> does seem worthwhile, if I have the spare time. And having a base
> 3.0 version that's not changing all over the place would be useful
> for that.
>
>       That said, I'm also not terribly comfortable with a "release"
> that's out there and unsupported.
>
>       Apologies if this has already been discussed, but I don't
> remember it. Although my memory isn't what it used to be (but
> some would claim it never was<G>)...
>
>       Erick




-- 
Robert Muir
rcmuir@gmail.com

Re: Why release 3.0?

Posted by Robert Muir <rc...@gmail.com>.

right, its nothing to do with lucene, instead due to property changes, etc.

i just think we should inform users on java 1.4/2.9 that if they upgrade to
java 1.5/3.0, they should reindex.

the reason i say this about properties, is there are some that change that
will affect tokenizers, i give two examples, a hyphen that changes from
punctuation to format (might affect SolrWordDelimiterFilter),
and arabic ayah which changes from NSM to format, which surely affects
ArabicLetterTokenizer.

On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sa...@syr.edu> wrote:

> Hi Robert,
>
> I agree that the Unicode version supported by the JVM, as you say, really
> has nothing to do with Lucene.
>
> The disruption here is users' upgrading from Java 1.4 to 1.5+, not when
> they upgrade Lucene.  I'd guess with few exceptions that most people have
> been using Lucene with 1.5+ for a couple of years now, though.
>
> But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on
> most Lucene users, assuming that most use Latin-1 exclusively; although I
> haven't looked, I'd be surprised if Latin-1 characters changed much, if at
> all, from Unicode 3.0 to 4.0.
>
> It would be useful, I think, to include (a pointer to?) a description of
> the details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release
> notes, since the minimum required Java version, and so also the supported
> Unicode version, changes then.
>
> Steve
>
> On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> > the problem is that the properties have changed for various characters,
> > and new characters were added.
> >
> > it really has nothing to do with lucene, but the idea you can go from
> > jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
> >
> >
> > On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> >
> >
> >       But an UTF-8 stream from Java 4 can still be read with Java 5,
> > what is the problem? Java 5 extended Unicode support, but an index
> > created with older versions can still be read. UTF-8 is standardized…
> >
> >
> >
> >       -----
> >       Uwe Schindler
> >       H.-H.-Meier-Allee 63, D-28213 Bremen
> >       http://www.thetaphi.de
> >       eMail: uwe@thetaphi.de
> >
> >
> > ________________________________
> >
> >
> >       From: Robert Muir [mailto:rcmuir@gmail.com]
> >       Sent: Monday, November 16, 2009 8:09 PM
> >
> >       To: java-dev@lucene.apache.org
> >       Subject: Re: Why release 3.0?
> >
> >
> >
> >       uwe, on topic please read my comment on LUCENE-1689, because
> > unicode version was bumped in jdk 1.5, i believe this index backwards
> > compatibility is only theoretical
> >
> >       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uw...@thetaphi.de>
> wrote:
> >
> >       2.9 has *not* the same format as 3.0, an index created with 3.0
> > cannot be read with 2.9. This is because compressed field support was
> > removed and therefore the version number of the stored fields file was
> > upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> > removed in 4.0. 3.0 Indexes can be read until version 4.9.
> >
> >
> >
> >       Uwe
> >
> >       -----
> >       Uwe Schindler
> >       H.-H.-Meier-Allee 63, D-28213 Bremen
> >       http://www.thetaphi.de
> >       eMail: uwe@thetaphi.de
> >
> >
> > ________________________________
> >
> >
> >       From: Jake Mannix [mailto:jake.mannix@gmail.com]
> >       Sent: Monday, November 16, 2009 7:15 PM
> >
> >
> >       To: java-dev@lucene.apache.org
> >
> >       Subject: Re: Why release 3.0?
> >
> >
> >
> >       Don't users need to upgrade to 3.0 because 3.1 won't be
> > necessarily able to read your
> >       2.4 index file formats?  I suppose if you've already upgraded to
> > 2.9, then all is well because
> >       2.9 is the same format as 3.0, but we can't assume all users
> > upgraded from 2.4 to 2.9.
> >
> >       If you've done that already, then 3.0 might not be necessary,
> > but if you're on 2.4 right now,
> >       you will be in for a bad surprise if you try to upgrade to 3.1.
> >
> >         -jake
> >
> >       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> > <er...@gmail.com> wrote:
> >
> >       One of my "specialties" is asking obvious questions just to see
> > if everyone's assumptions are aligned. So with the discussion about
> > branching 3.0 I have to ask "Is there going to be any 3.0 release
> > intended for *production*?". And if not, would we save a lot of
> > work by just not worrying about retrofitting fixes to a 3.0 branch
> > and carrying on with 3.1 as the first *supported* 3.x release?
> >
> >       Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> > sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> > "beta/snapshot" release to get a head start on cleaning up my code
> > does seem worthwhile, if I have the spare time. And having a base
> > 3.0 version that's not changing all over the place would be useful
> > for that.
> >
> >       That said, I'm also not terribly comfortable with a "release"
> > that's out there and unsupported.
> >
> >       Apologies if this has already been discussed, but I don't
> > remember it. Although my memory isn't what it used to be (but
> > some would claim it never was<G>)...
> >
> >       Erick
>
>


-- 
Robert Muir
rcmuir@gmail.com

RE: Why release 3.0?

Posted by Uwe Schindler <uw...@thetaphi.de>.

I would rename the java file/class and write a big warning on it: for
version < 3.0. Do not recreate (which cannot be done, because jflex file is
missing). The current jflex file is recreated and is now the official
support 1.5 version. The 1.4 version will never change!

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Robert Muir [mailto:rcmuir@gmail.com] 
Sent: Monday, November 16, 2009 9:15 PM
To: java-dev@lucene.apache.org
Subject: Re: Why release 3.0?

Steven, I think we can be almost sure of no latin-1 changes.

what do you think about this jflex situation though?
it seems like a mess, is there anything we can do before the jflex 1.5 stuff
that is going on now (where we could actually link Version to the unicode
version jflex uses explicitly?)

should we generate a separate jflex for 3.0 based on 1.5 jre and use it
depending on Version for now?

On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sa...@syr.edu> wrote:

Hi Robert,

I agree that the Unicode version supported by the JVM, as you say, really
has nothing to do with Lucene.

The disruption here is users' upgrading from Java 1.4 to 1.5+, not when they
upgrade Lucene.  I'd guess with few exceptions that most people have been
using Lucene with 1.5+ for a couple of years now, though.

But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on
most Lucene users, assuming that most use Latin-1 exclusively; although I
haven't looked, I'd be surprised if Latin-1 characters changed much, if at
all, from Unicode 3.0 to 4.0.

It would be useful, I think, to include (a pointer to?) a description of the
details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release notes,
since the minimum required Java version, and so also the supported Unicode
version, changes then.

Steve

On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> the problem is that the properties have changed for various characters,
> and new characters were added.
>
> it really has nothing to do with lucene, but the idea you can go from
> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
>
>
> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
>
>       But an UTF-8 stream from Java 4 can still be read with Java 5,
> what is the problem? Java 5 extended Unicode support, but an index
> created with older versions can still be read. UTF-8 is standardized.
>
>
>
>       -----
>       Uwe Schindler
>       H.-H.-Meier-Allee 63, D-28213 Bremen
>       http://www.thetaphi.de
>       eMail: uwe@thetaphi.de
>
>
> ________________________________
>
>
>       From: Robert Muir [mailto:rcmuir@gmail.com]
>       Sent: Monday, November 16, 2009 8:09 PM
>
>       To: java-dev@lucene.apache.org
>       Subject: Re: Why release 3.0?
>
>
>
>       uwe, on topic please read my comment on LUCENE-1689, because
> unicode version was bumped in jdk 1.5, i believe this index backwards
> compatibility is only theoretical
>
>       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uw...@thetaphi.de>
wrote:
>
>       2.9 has *not* the same format as 3.0, an index created with 3.0
> cannot be read with 2.9. This is because compressed field support was
> removed and therefore the version number of the stored fields file was
> upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> removed in 4.0. 3.0 Indexes can be read until version 4.9.
>
>
>
>       Uwe
>
>       -----
>       Uwe Schindler
>       H.-H.-Meier-Allee 63, D-28213 Bremen
>       http://www.thetaphi.de
>       eMail: uwe@thetaphi.de
>
>
> ________________________________
>
>
>       From: Jake Mannix [mailto:jake.mannix@gmail.com]
>       Sent: Monday, November 16, 2009 7:15 PM
>
>
>       To: java-dev@lucene.apache.org
>
>       Subject: Re: Why release 3.0?
>
>
>
>       Don't users need to upgrade to 3.0 because 3.1 won't be
> necessarily able to read your
>       2.4 index file formats?  I suppose if you've already upgraded to
> 2.9, then all is well because
>       2.9 is the same format as 3.0, but we can't assume all users
> upgraded from 2.4 to 2.9.
>
>       If you've done that already, then 3.0 might not be necessary,
> but if you're on 2.4 right now,
>       you will be in for a bad surprise if you try to upgrade to 3.1.
>
>         -jake
>
>       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> <er...@gmail.com> wrote:
>
>       One of my "specialties" is asking obvious questions just to see
> if everyone's assumptions are aligned. So with the discussion about
> branching 3.0 I have to ask "Is there going to be any 3.0 release
> intended for *production*?". And if not, would we save a lot of
> work by just not worrying about retrofitting fixes to a 3.0 branch
> and carrying on with 3.1 as the first *supported* 3.x release?
>
>       Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> "beta/snapshot" release to get a head start on cleaning up my code
> does seem worthwhile, if I have the spare time. And having a base
> 3.0 version that's not changing all over the place would be useful
> for that.
>
>       That said, I'm also not terribly comfortable with a "release"
> that's out there and unsupported.
>
>       Apologies if this has already been discussed, but I don't
> remember it. Although my memory isn't what it used to be (but
> some would claim it never was<G>)...
>
>       Erick

-- 
Robert Muir
rcmuir@gmail.com

Re: Why release 3.0?

Posted by Robert Muir <rc...@gmail.com>.

Steven, I think we can be almost sure of no latin-1 changes.

what do you think about this jflex situation though?
it seems like a mess, is there anything we can do before the jflex 1.5 stuff
that is going on now (where we could actually link Version to the unicode
version jflex uses explicitly?)

should we generate a separate jflex for 3.0 based on 1.5 jre and use it
depending on Version for now?

On Mon, Nov 16, 2009 at 2:33 PM, Steven A Rowe <sa...@syr.edu> wrote:

> Hi Robert,
>
> I agree that the Unicode version supported by the JVM, as you say, really
> has nothing to do with Lucene.
>
> The disruption here is users' upgrading from Java 1.4 to 1.5+, not when
> they upgrade Lucene.  I'd guess with few exceptions that most people have
> been using Lucene with 1.5+ for a couple of years now, though.
>
> But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on
> most Lucene users, assuming that most use Latin-1 exclusively; although I
> haven't looked, I'd be surprised if Latin-1 characters changed much, if at
> all, from Unicode 3.0 to 4.0.
>
> It would be useful, I think, to include (a pointer to?) a description of
> the details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release
> notes, since the minimum required Java version, and so also the supported
> Unicode version, changes then.
>
> Steve
>
> On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> > the problem is that the properties have changed for various characters,
> > and new characters were added.
> >
> > it really has nothing to do with lucene, but the idea you can go from
> > jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
> >
> >
> > On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> >
> >
> >       But an UTF-8 stream from Java 4 can still be read with Java 5,
> > what is the problem? Java 5 extended Unicode support, but an index
> > created with older versions can still be read. UTF-8 is standardized…
> >
> >
> >
> >       -----
> >       Uwe Schindler
> >       H.-H.-Meier-Allee 63, D-28213 Bremen
> >       http://www.thetaphi.de
> >       eMail: uwe@thetaphi.de
> >
> >
> > ________________________________
> >
> >
> >       From: Robert Muir [mailto:rcmuir@gmail.com]
> >       Sent: Monday, November 16, 2009 8:09 PM
> >
> >       To: java-dev@lucene.apache.org
> >       Subject: Re: Why release 3.0?
> >
> >
> >
> >       uwe, on topic please read my comment on LUCENE-1689, because
> > unicode version was bumped in jdk 1.5, i believe this index backwards
> > compatibility is only theoretical
> >
> >       On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uw...@thetaphi.de>
> wrote:
> >
> >       2.9 has *not* the same format as 3.0, an index created with 3.0
> > cannot be read with 2.9. This is because compressed field support was
> > removed and therefore the version number of the stored fields file was
> > upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> > removed in 4.0. 3.0 Indexes can be read until version 4.9.
> >
> >
> >
> >       Uwe
> >
> >       -----
> >       Uwe Schindler
> >       H.-H.-Meier-Allee 63, D-28213 Bremen
> >       http://www.thetaphi.de
> >       eMail: uwe@thetaphi.de
> >
> >
> > ________________________________
> >
> >
> >       From: Jake Mannix [mailto:jake.mannix@gmail.com]
> >       Sent: Monday, November 16, 2009 7:15 PM
> >
> >
> >       To: java-dev@lucene.apache.org
> >
> >       Subject: Re: Why release 3.0?
> >
> >
> >
> >       Don't users need to upgrade to 3.0 because 3.1 won't be
> > necessarily able to read your
> >       2.4 index file formats?  I suppose if you've already upgraded to
> > 2.9, then all is well because
> >       2.9 is the same format as 3.0, but we can't assume all users
> > upgraded from 2.4 to 2.9.
> >
> >       If you've done that already, then 3.0 might not be necessary,
> > but if you're on 2.4 right now,
> >       you will be in for a bad surprise if you try to upgrade to 3.1.
> >
> >         -jake
> >
> >       On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> > <er...@gmail.com> wrote:
> >
> >       One of my "specialties" is asking obvious questions just to see
> > if everyone's assumptions are aligned. So with the discussion about
> > branching 3.0 I have to ask "Is there going to be any 3.0 release
> > intended for *production*?". And if not, would we save a lot of
> > work by just not worrying about retrofitting fixes to a 3.0 branch
> > and carrying on with 3.1 as the first *supported* 3.x release?
> >
> >       Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> > sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> > "beta/snapshot" release to get a head start on cleaning up my code
> > does seem worthwhile, if I have the spare time. And having a base
> > 3.0 version that's not changing all over the place would be useful
> > for that.
> >
> >       That said, I'm also not terribly comfortable with a "release"
> > that's out there and unsupported.
> >
> >       Apologies if this has already been discussed, but I don't
> > remember it. Although my memory isn't what it used to be (but
> > some would claim it never was<G>)...
> >
> >       Erick
>
>


-- 
Robert Muir
rcmuir@gmail.com

RE: Why release 3.0?

Posted by Steven A Rowe <sa...@syr.edu>.

Hi Robert,

I agree that the Unicode version supported by the JVM, as you say, really has nothing to do with Lucene.

The disruption here is users' upgrading from Java 1.4 to 1.5+, not when they upgrade Lucene.  I'd guess with few exceptions that most people have been using Lucene with 1.5+ for a couple of years now, though.
 
But even the upgrade from Java 1.4 to 1.5+ will have (had) zero impact on most Lucene users, assuming that most use Latin-1 exclusively; although I haven't looked, I'd be surprised if Latin-1 characters changed much, if at all, from Unicode 3.0 to 4.0.

It would be useful, I think, to include (a pointer to?) a description of the details of the Unicode 3.0->4.0 differences in the Lucene 3.0 release notes, since the minimum required Java version, and so also the supported Unicode version, changes then.

Steve

On 11/16/2009 at 2:15 PM, Robert Muir wrote:
> the problem is that the properties have changed for various characters,
> and new characters were added.
> 
> it really has nothing to do with lucene, but the idea you can go from
> jdk 1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.
> 
> 
> On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> 
> 
> 	But an UTF-8 stream from Java 4 can still be read with Java 5,
> what is the problem? Java 5 extended Unicode support, but an index
> created with older versions can still be read. UTF-8 is standardized…
> 
> 
> 
> 	-----
> 	Uwe Schindler
> 	H.-H.-Meier-Allee 63, D-28213 Bremen
> 	http://www.thetaphi.de
> 	eMail: uwe@thetaphi.de
> 
> 
> ________________________________
> 
> 
> 	From: Robert Muir [mailto:rcmuir@gmail.com]
> 	Sent: Monday, November 16, 2009 8:09 PM
> 
> 	To: java-dev@lucene.apache.org
> 	Subject: Re: Why release 3.0?
> 
> 
> 
> 	uwe, on topic please read my comment on LUCENE-1689, because
> unicode version was bumped in jdk 1.5, i believe this index backwards
> compatibility is only theoretical
> 
> 	On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> 
> 	2.9 has *not* the same format as 3.0, an index created with 3.0
> cannot be read with 2.9. This is because compressed field support was
> removed and therefore the version number of the stored fields file was
> upgraded. But indexes from 2.9 can be read with 3.0 and support may get
> removed in 4.0. 3.0 Indexes can be read until version 4.9.
> 
> 
> 
> 	Uwe
> 
> 	-----
> 	Uwe Schindler
> 	H.-H.-Meier-Allee 63, D-28213 Bremen
> 	http://www.thetaphi.de
> 	eMail: uwe@thetaphi.de
> 
> 
> ________________________________
> 
> 
> 	From: Jake Mannix [mailto:jake.mannix@gmail.com]
> 	Sent: Monday, November 16, 2009 7:15 PM
> 
> 
> 	To: java-dev@lucene.apache.org
> 
> 	Subject: Re: Why release 3.0?
> 
> 
> 
> 	Don't users need to upgrade to 3.0 because 3.1 won't be
> necessarily able to read your
> 	2.4 index file formats?  I suppose if you've already upgraded to
> 2.9, then all is well because
> 	2.9 is the same format as 3.0, but we can't assume all users
> upgraded from 2.4 to 2.9.
> 
> 	If you've done that already, then 3.0 might not be necessary,
> but if you're on 2.4 right now,
> 	you will be in for a bad surprise if you try to upgrade to 3.1.
> 
> 	  -jake
> 
> 	On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson
> <er...@gmail.com> wrote:
> 
> 	One of my "specialties" is asking obvious questions just to see
> if everyone's assumptions are aligned. So with the discussion about
> branching 3.0 I have to ask "Is there going to be any 3.0 release
> intended for *production*?". And if not, would we save a lot of
> work by just not worrying about retrofitting fixes to a 3.0 branch
> and carrying on with 3.1 as the first *supported* 3.x release?
> 
> 	Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not
> sure *as a user* I see a good reason to upgrade to 3.0. Getting a
> "beta/snapshot" release to get a head start on cleaning up my code
> does seem worthwhile, if I have the spare time. And having a base
> 3.0 version that's not changing all over the place would be useful
> for that.
> 
> 	That said, I'm also not terribly comfortable with a "release"
> that's out there and unsupported.
> 
> 	Apologies if this has already been discussed, but I don't
> remember it. Although my memory isn't what it used to be (but
> some would claim it never was<G>)...
> 
> 	Erick

RE: Why release 3.0?

Posted by Uwe Schindler <uw...@thetaphi.de>.

But most people already use 1.5 or 1.6 even with 2.9. They could also switch
before. The problem is the used JVM not the used Lucene Version. And you can
also run Lucene 1.4.3 with Java 5 -> same problem. If people change their
Java Version, they have to take care what changed.

 

The only thing: we are forcing people to use Java 5.

 

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Robert Muir [mailto:rcmuir@gmail.com] 
Sent: Monday, November 16, 2009 8:16 PM
To: java-dev@lucene.apache.org
Subject: Re: Why release 3.0?

 

the problem is that the properties have changed for various characters, and
new characters were added.

it really has nothing to do with lucene, but the idea you can go from jdk
1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.

On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

But an UTF-8 stream from Java 4 can still be read with Java 5, what is the
problem? Java 5 extended Unicode support, but an index created with older
versions can still be read. UTF-8 is standardized.

 

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Robert Muir [mailto:rcmuir@gmail.com] 
Sent: Monday, November 16, 2009 8:09 PM


To: java-dev@lucene.apache.org
Subject: Re: Why release 3.0?

 

uwe, on topic please read my comment on LUCENE-1689, because unicode version
was bumped in jdk 1.5, i believe this index backwards compatibility is only
theoretical

On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

2.9 has *not* the same format as 3.0, an index created with 3.0 cannot be
read with 2.9. This is because compressed field support was removed and
therefore the version number of the stored fields file was upgraded. But
indexes from 2.9 can be read with 3.0 and support may get removed in 4.0.
3.0 Indexes can be read until version 4.9.

 

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Jake Mannix [mailto:jake.mannix@gmail.com] 
Sent: Monday, November 16, 2009 7:15 PM


To: java-dev@lucene.apache.org

Subject: Re: Why release 3.0?

 

Don't users need to upgrade to 3.0 because 3.1 won't be necessarily able to
read your
2.4 index file formats?  I suppose if you've already upgraded to 2.9, then
all is well because
2.9 is the same format as 3.0, but we can't assume all users upgraded from
2.4 to 2.9.  

If you've done that already, then 3.0 might not be necessary, but if you're
on 2.4 right now,
you will be in for a bad surprise if you try to upgrade to 3.1.

  -jake

On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson <er...@gmail.com>
wrote:

One of my "specialties" is asking obvious questions just to see if
everyone's assumptions 

are aligned. So with the discussion about branching 3.0 I have to ask "Is
there going to 

be any 3.0 release intended for *production*?". And if not, would we save a
lot of work

by just not worrying about retrofitting fixes to a 3.0 branch and carrying
on with 3.1 

as the first *supported* 3.x release?

 

Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not sure *as a
user* I see a

good reason to upgrade to 3.0. Getting a "beta/snapshot" release to get a
head start on

cleaning up my code does seem worthwhile, if I have the spare time. And
having a base

3.0 version that's not changing all over the place would be useful for that.

 

That said, I'm also not terribly comfortable with a "release" that's out
there and unsupported.

 

Apologies if this has already been discussed, but I don't remember it.
Although my memory

isn't what it used to be (but some would claim it never was<G>)...

 

Erick

 

 

 




-- 
Robert Muir
rcmuir@gmail.com




-- 
Robert Muir
rcmuir@gmail.com

Re: Why release 3.0?

Posted by Robert Muir <rc...@gmail.com>.

the problem is that the properties have changed for various characters, and
new characters were added.

it really has nothing to do with lucene, but the idea you can go from jdk
1.4/lucene 2.9 to jdk 1.5/lucene3.0 without reindexing is not true.

On Mon, Nov 16, 2009 at 2:12 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

>  But an UTF-8 stream from Java 4 can still be read with Java 5, what is
> the problem? Java 5 extended Unicode support, but an index created with
> older versions can still be read. UTF-8 is standardized…
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>   ------------------------------
>
> *From:* Robert Muir [mailto:rcmuir@gmail.com]
> *Sent:* Monday, November 16, 2009 8:09 PM
>
> *To:* java-dev@lucene.apache.org
> *Subject:* Re: Why release 3.0?
>
>
>
> uwe, on topic please read my comment on LUCENE-1689, because unicode
> version was bumped in jdk 1.5, i believe this index backwards compatibility
> is only theoretical
>
> On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
> 2.9 has **not** the same format as 3.0, an index created with 3.0 cannot
> be read with 2.9. This is because compressed field support was removed and
> therefore the version number of the stored fields file was upgraded. But
> indexes from 2.9 can be read with 3.0 and support may get removed in 4.0.
> 3.0 Indexes can be read until version 4.9.
>
>
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>    ------------------------------
>
> *From:* Jake Mannix [mailto:jake.mannix@gmail.com]
> *Sent:* Monday, November 16, 2009 7:15 PM
>
>
> *To:* java-dev@lucene.apache.org
>
> *Subject:* Re: Why release 3.0?
>
>
>
> Don't users need to upgrade to 3.0 because 3.1 won't be necessarily able to
> read your
> 2.4 index file formats?  I suppose if you've already upgraded to 2.9, then
> all is well because
> 2.9 is the same format as 3.0, but we can't assume all users upgraded from
> 2.4 to 2.9.
>
> If you've done that already, then 3.0 might not be necessary, but if you're
> on 2.4 right now,
> you will be in for a bad surprise if you try to upgrade to 3.1.
>
>   -jake
>
> On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson <er...@gmail.com>
> wrote:
>
> One of my "specialties" is asking obvious questions just to see if
> everyone's assumptions
>
> are aligned. So with the discussion about branching 3.0 I have to ask "Is
> there going to
>
> be any 3.0 release intended for *production*?". And if not, would we save a
> lot of work
>
> by just not worrying about retrofitting fixes to a 3.0 branch and carrying
> on with 3.1
>
> as the first *supported* 3.x release?
>
>
>
> Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not sure *as a
> user* I see a
>
> good reason to upgrade to 3.0. Getting a "beta/snapshot" release to get a
> head start on
>
> cleaning up my code does seem worthwhile, if I have the spare time. And
> having a base
>
> 3.0 version that's not changing all over the place would be useful for
> that.
>
>
>
> That said, I'm also not terribly comfortable with a "release" that's out
> there and unsupported.
>
>
>
> Apologies if this has already been discussed, but I don't remember it.
> Although my memory
>
> isn't what it used to be (but some would claim it never was<G>)...
>
>
>
> Erick
>
>
>
>
>
>
>
>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>



-- 
Robert Muir
rcmuir@gmail.com

RE: Why release 3.0?

Posted by Uwe Schindler <uw...@thetaphi.de>.

But an UTF-8 stream from Java 4 can still be read with Java 5, what is the
problem? Java 5 extended Unicode support, but an index created with older
versions can still be read. UTF-8 is standardized.

 

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Robert Muir [mailto:rcmuir@gmail.com] 
Sent: Monday, November 16, 2009 8:09 PM
To: java-dev@lucene.apache.org
Subject: Re: Why release 3.0?

 

uwe, on topic please read my comment on LUCENE-1689, because unicode version
was bumped in jdk 1.5, i believe this index backwards compatibility is only
theoretical

On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

2.9 has *not* the same format as 3.0, an index created with 3.0 cannot be
read with 2.9. This is because compressed field support was removed and
therefore the version number of the stored fields file was upgraded. But
indexes from 2.9 can be read with 3.0 and support may get removed in 4.0.
3.0 Indexes can be read until version 4.9.

 

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Jake Mannix [mailto:jake.mannix@gmail.com] 
Sent: Monday, November 16, 2009 7:15 PM


To: java-dev@lucene.apache.org

Subject: Re: Why release 3.0?

 

Don't users need to upgrade to 3.0 because 3.1 won't be necessarily able to
read your
2.4 index file formats?  I suppose if you've already upgraded to 2.9, then
all is well because
2.9 is the same format as 3.0, but we can't assume all users upgraded from
2.4 to 2.9.  

If you've done that already, then 3.0 might not be necessary, but if you're
on 2.4 right now,
you will be in for a bad surprise if you try to upgrade to 3.1.

  -jake

On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson <er...@gmail.com>
wrote:

One of my "specialties" is asking obvious questions just to see if
everyone's assumptions 

are aligned. So with the discussion about branching 3.0 I have to ask "Is
there going to 

be any 3.0 release intended for *production*?". And if not, would we save a
lot of work

by just not worrying about retrofitting fixes to a 3.0 branch and carrying
on with 3.1 

as the first *supported* 3.x release?

 

Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not sure *as a
user* I see a

good reason to upgrade to 3.0. Getting a "beta/snapshot" release to get a
head start on

cleaning up my code does seem worthwhile, if I have the spare time. And
having a base

3.0 version that's not changing all over the place would be useful for that.

 

That said, I'm also not terribly comfortable with a "release" that's out
there and unsupported.

 

Apologies if this has already been discussed, but I don't remember it.
Although my memory

isn't what it used to be (but some would claim it never was<G>)...

 

Erick

 

 

 




-- 
Robert Muir
rcmuir@gmail.com

Re: Why release 3.0?

Posted by Robert Muir <rc...@gmail.com>.

uwe, on topic please read my comment on LUCENE-1689, because unicode version
was bumped in jdk 1.5, i believe this index backwards compatibility is only
theoretical

On Mon, Nov 16, 2009 at 2:05 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

>  2.9 has **not** the same format as 3.0, an index created with 3.0 cannot
> be read with 2.9. This is because compressed field support was removed and
> therefore the version number of the stored fields file was upgraded. But
> indexes from 2.9 can be read with 3.0 and support may get removed in 4.0.
> 3.0 Indexes can be read until version 4.9.
>
>
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>   ------------------------------
>
> *From:* Jake Mannix [mailto:jake.mannix@gmail.com]
> *Sent:* Monday, November 16, 2009 7:15 PM
>
> *To:* java-dev@lucene.apache.org
> *Subject:* Re: Why release 3.0?
>
>
>
> Don't users need to upgrade to 3.0 because 3.1 won't be necessarily able to
> read your
> 2.4 index file formats?  I suppose if you've already upgraded to 2.9, then
> all is well because
> 2.9 is the same format as 3.0, but we can't assume all users upgraded from
> 2.4 to 2.9.
>
> If you've done that already, then 3.0 might not be necessary, but if you're
> on 2.4 right now,
> you will be in for a bad surprise if you try to upgrade to 3.1.
>
>   -jake
>
> On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson <er...@gmail.com>
> wrote:
>
> One of my "specialties" is asking obvious questions just to see if
> everyone's assumptions
>
> are aligned. So with the discussion about branching 3.0 I have to ask "Is
> there going to
>
> be any 3.0 release intended for *production*?". And if not, would we save a
> lot of work
>
> by just not worrying about retrofitting fixes to a 3.0 branch and carrying
> on with 3.1
>
> as the first *supported* 3.x release?
>
>
>
> Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not sure *as a
> user* I see a
>
> good reason to upgrade to 3.0. Getting a "beta/snapshot" release to get a
> head start on
>
> cleaning up my code does seem worthwhile, if I have the spare time. And
> having a base
>
> 3.0 version that's not changing all over the place would be useful for
> that.
>
>
>
> That said, I'm also not terribly comfortable with a "release" that's out
> there and unsupported.
>
>
>
> Apologies if this has already been discussed, but I don't remember it.
> Although my memory
>
> isn't what it used to be (but some would claim it never was<G>)...
>
>
>
> Erick
>
>
>
>
>
>
>



-- 
Robert Muir
rcmuir@gmail.com

RE: Why release 3.0?

Posted by Uwe Schindler <uw...@thetaphi.de>.

2.9 has *not* the same format as 3.0, an index created with 3.0 cannot be
read with 2.9. This is because compressed field support was removed and
therefore the version number of the stored fields file was upgraded. But
indexes from 2.9 can be read with 3.0 and support may get removed in 4.0.
3.0 Indexes can be read until version 4.9.

 

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Jake Mannix [mailto:jake.mannix@gmail.com] 
Sent: Monday, November 16, 2009 7:15 PM
To: java-dev@lucene.apache.org
Subject: Re: Why release 3.0?

 

Don't users need to upgrade to 3.0 because 3.1 won't be necessarily able to
read your
2.4 index file formats?  I suppose if you've already upgraded to 2.9, then
all is well because
2.9 is the same format as 3.0, but we can't assume all users upgraded from
2.4 to 2.9.  

If you've done that already, then 3.0 might not be necessary, but if you're
on 2.4 right now,
you will be in for a bad surprise if you try to upgrade to 3.1.

  -jake

On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson <er...@gmail.com>
wrote:

One of my "specialties" is asking obvious questions just to see if
everyone's assumptions 

are aligned. So with the discussion about branching 3.0 I have to ask "Is
there going to 

be any 3.0 release intended for *production*?". And if not, would we save a
lot of work

by just not worrying about retrofitting fixes to a 3.0 branch and carrying
on with 3.1 

as the first *supported* 3.x release?

 

Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not sure *as a
user* I see a

good reason to upgrade to 3.0. Getting a "beta/snapshot" release to get a
head start on

cleaning up my code does seem worthwhile, if I have the spare time. And
having a base

3.0 version that's not changing all over the place would be useful for that.

 

That said, I'm also not terribly comfortable with a "release" that's out
there and unsupported.

 

Apologies if this has already been discussed, but I don't remember it.
Although my memory

isn't what it used to be (but some would claim it never was<G>)...

 

Erick

Re: Why release 3.0?

Posted by Jake Mannix <ja...@gmail.com>.

Don't users need to upgrade to 3.0 because 3.1 won't be necessarily able to
read your
2.4 index file formats?  I suppose if you've already upgraded to 2.9, then
all is well because
2.9 is the same format as 3.0, but we can't assume all users upgraded from
2.4 to 2.9.

If you've done that already, then 3.0 might not be necessary, but if you're
on 2.4 right now,
you will be in for a bad surprise if you try to upgrade to 3.1.

  -jake

On Mon, Nov 16, 2009 at 10:10 AM, Erick Erickson <er...@gmail.com>wrote:

> One of my "specialties" is asking obvious questions just to see if
> everyone's assumptions
> are aligned. So with the discussion about branching 3.0 I have to ask "Is
> there going to
> be any 3.0 release intended for *production*?". And if not, would we save a
> lot of work
> by just not worrying about retrofitting fixes to a 3.0 branch and carrying
> on with 3.1
> as the first *supported* 3.x release?
>
> Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not sure *as a
> user* I see a
> good reason to upgrade to 3.0. Getting a "beta/snapshot" release to get a
> head start on
> cleaning up my code does seem worthwhile, if I have the spare time. And
> having a base
> 3.0 version that's not changing all over the place would be useful for
> that.
>
> That said, I'm also not terribly comfortable with a "release" that's out
> there and unsupported.
>
> Apologies if this has already been discussed, but I don't remember it.
> Although my memory
> isn't what it used to be (but some would claim it never was<G>)...
>
> Erick
>
>
>

RE: Why release 3.0?

Posted by Uwe Schindler <uw...@thetaphi.de>.

We support 3.0, why do you tend to say something other? I will always fix
the bug first in 3.0 and then merge (perhaps) back to 2.9.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Monday, November 16, 2009 9:13 PM
To: java-dev@lucene.apache.org
Subject: Re: Why release 3.0?

 

Oops, stupid mouse made me send a blank message.

 

Ok, I withdraw the question since there *are* good reasons to put

3.0 in a prod environment <G>. It's also an easier thing to say "new Lucene

users should start with 3.0" rather than "new Lucene users should

start with 3.1. Use 3.0 until we release 3.1 but be aware we're not going to

support 3.0...." Yuuuuccckkkk....

 

Erick

On Mon, Nov 16, 2009 at 2:03 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

Hi Erick,

 

3.0 is *not* unsupported or beta release, it is the cleaned up 2.9.1
release. You are right, it is not needed for 2.9.1 users to upgrade (but
they can), but for new users starting with Lucene, the recommendadion is to
use it and not 2.9. 

3.0 also contains some cleanups needed for 3.1, as the compressed fields are
no longer supported, so they must be uncompressed, which is done during
optimizing/merging in 3.0. Later versions will remove support for older
index types, but you should really update your indexes, especially because
flex indexing will possibly remove more support for older indexes (as it
gets more complex to maintain all the different file formats).

 

So 3.0 is recommended for users starting new Java 5 projects and want a
clean API. People needing backwards compatibility can use 2.9.1, but support
for that version will be cancelled in future and bugfixes will only go into
3.x.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Monday, November 16, 2009 7:10 PM
To: java-dev@lucene.apache.org
Subject: Why release 3.0?

 

One of my "specialties" is asking obvious questions just to see if
everyone's assumptions 

are aligned. So with the discussion about branching 3.0 I have to ask "Is
there going to 

be any 3.0 release intended for *production*?". And if not, would we save a
lot of work

by just not worrying about retrofitting fixes to a 3.0 branch and carrying
on with 3.1 

as the first *supported* 3.x release?

 

Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not sure *as a
user* I see a

good reason to upgrade to 3.0. Getting a "beta/snapshot" release to get a
head start on

cleaning up my code does seem worthwhile, if I have the spare time. And
having a base

3.0 version that's not changing all over the place would be useful for that.

 

That said, I'm also not terribly comfortable with a "release" that's out
there and unsupported.

 

Apologies if this has already been discussed, but I don't remember it.
Although my memory

isn't what it used to be (but some would claim it never was<G>)...

 

Erick

Re: Why release 3.0?

Posted by Erick Erickson <er...@gmail.com>.

Oops, stupid mouse made me send a blank message.

Ok, I withdraw the question since there *are* good reasons to put
3.0 in a prod environment <G>. It's also an easier thing to say "new Lucene
users should start with 3.0" rather than "new Lucene users should
start with 3.1. Use 3.0 until we release 3.1 but be aware we're not going to
support 3.0...." Yuuuuccckkkk....

Erick

On Mon, Nov 16, 2009 at 2:03 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

>  Hi Erick,
>
>
>
> 3.0 is **not** unsupported or beta release, it is the cleaned up 2.9.1
> release. You are right, it is not needed for 2.9.1 users to upgrade (but
> they can), but for new users starting with Lucene, the recommendadion is to
> use it and not 2.9.
>
> 3.0 also contains some cleanups needed for 3.1, as the compressed fields
> are no longer supported, so they must be uncompressed, which is done during
> optimizing/merging in 3.0. Later versions will remove support for older
> index types, but you should really update your indexes, especially because
> flex indexing will possibly remove more support for older indexes (as it
> gets more complex to maintain all the different file formats).
>
>
>
> So 3.0 is recommended for users starting new Java 5 projects and want a
> clean API. People needing backwards compatibility can use 2.9.1, but support
> for that version will be cancelled in future and bugfixes will only go into
> 3.x.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>   ------------------------------
>
> *From:* Erick Erickson [mailto:erickerickson@gmail.com]
> *Sent:* Monday, November 16, 2009 7:10 PM
> *To:* java-dev@lucene.apache.org
> *Subject:* Why release 3.0?
>
>
>
> One of my "specialties" is asking obvious questions just to see if
> everyone's assumptions
>
> are aligned. So with the discussion about branching 3.0 I have to ask "Is
> there going to
>
> be any 3.0 release intended for *production*?". And if not, would we save a
> lot of work
>
> by just not worrying about retrofitting fixes to a 3.0 branch and carrying
> on with 3.1
>
> as the first *supported* 3.x release?
>
>
>
> Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not sure *as a
> user* I see a
>
> good reason to upgrade to 3.0. Getting a "beta/snapshot" release to get a
> head start on
>
> cleaning up my code does seem worthwhile, if I have the spare time. And
> having a base
>
> 3.0 version that's not changing all over the place would be useful for
> that.
>
>
>
> That said, I'm also not terribly comfortable with a "release" that's out
> there and unsupported.
>
>
>
> Apologies if this has already been discussed, but I don't remember it.
> Although my memory
>
> isn't what it used to be (but some would claim it never was<G>)...
>
>
>
> Erick
>
>
>
>
>

Re: Why release 3.0?

Posted by Erick Erickson <er...@gmail.com>.

On Mon, Nov 16, 2009 at 2:03 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

>  Hi Erick,
>
>
>
> 3.0 is **not** unsupported or beta release, it is the cleaned up 2.9.1
> release. You are right, it is not needed for 2.9.1 users to upgrade (but
> they can), but for new users starting with Lucene, the recommendadion is to
> use it and not 2.9.
>
> 3.0 also contains some cleanups needed for 3.1, as the compressed fields
> are no longer supported, so they must be uncompressed, which is done during
> optimizing/merging in 3.0. Later versions will remove support for older
> index types, but you should really update your indexes, especially because
> flex indexing will possibly remove more support for older indexes (as it
> gets more complex to maintain all the different file formats).
>
>
>
> So 3.0 is recommended for users starting new Java 5 projects and want a
> clean API. People needing backwards compatibility can use 2.9.1, but support
> for that version will be cancelled in future and bugfixes will only go into
> 3.x.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>   ------------------------------
>
> *From:* Erick Erickson [mailto:erickerickson@gmail.com]
> *Sent:* Monday, November 16, 2009 7:10 PM
> *To:* java-dev@lucene.apache.org
> *Subject:* Why release 3.0?
>
>
>
> One of my "specialties" is asking obvious questions just to see if
> everyone's assumptions
>
> are aligned. So with the discussion about branching 3.0 I have to ask "Is
> there going to
>
> be any 3.0 release intended for *production*?". And if not, would we save a
> lot of work
>
> by just not worrying about retrofitting fixes to a 3.0 branch and carrying
> on with 3.1
>
> as the first *supported* 3.x release?
>
>
>
> Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not sure *as a
> user* I see a
>
> good reason to upgrade to 3.0. Getting a "beta/snapshot" release to get a
> head start on
>
> cleaning up my code does seem worthwhile, if I have the spare time. And
> having a base
>
> 3.0 version that's not changing all over the place would be useful for
> that.
>
>
>
> That said, I'm also not terribly comfortable with a "release" that's out
> there and unsupported.
>
>
>
> Apologies if this has already been discussed, but I don't remember it.
> Although my memory
>
> isn't what it used to be (but some would claim it never was<G>)...
>
>
>
> Erick
>
>
>
>
>

RE: Why release 3.0?

Posted by Uwe Schindler <uw...@thetaphi.de>.

Hi Erick,

 

3.0 is *not* unsupported or beta release, it is the cleaned up 2.9.1
release. You are right, it is not needed for 2.9.1 users to upgrade (but
they can), but for new users starting with Lucene, the recommendadion is to
use it and not 2.9. 

3.0 also contains some cleanups needed for 3.1, as the compressed fields are
no longer supported, so they must be uncompressed, which is done during
optimizing/merging in 3.0. Later versions will remove support for older
index types, but you should really update your indexes, especially because
flex indexing will possibly remove more support for older indexes (as it
gets more complex to maintain all the different file formats).

 

So 3.0 is recommended for users starting new Java 5 projects and want a
clean API. People needing backwards compatibility can use 2.9.1, but support
for that version will be cancelled in future and bugfixes will only go into
3.x.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Monday, November 16, 2009 7:10 PM
To: java-dev@lucene.apache.org
Subject: Why release 3.0?

 

One of my "specialties" is asking obvious questions just to see if
everyone's assumptions 

are aligned. So with the discussion about branching 3.0 I have to ask "Is
there going to 

be any 3.0 release intended for *production*?". And if not, would we save a
lot of work

by just not worrying about retrofitting fixes to a 3.0 branch and carrying
on with 3.1 

as the first *supported* 3.x release?

 

Since 3.0 is "upgrade-to-java5 and remove deprecations", I'm not sure *as a
user* I see a

good reason to upgrade to 3.0. Getting a "beta/snapshot" release to get a
head start on

cleaning up my code does seem worthwhile, if I have the spare time. And
having a base

3.0 version that's not changing all over the place would be useful for that.

 

That said, I'm also not terribly comfortable with a "release" that's out
there and unsupported.

 

Apologies if this has already been discussed, but I don't remember it.
Although my memory

isn't what it used to be (but some would claim it never was<G>)...

 

Erick