You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Nathanael D. Jones" <na...@gmail.com> on 2010/03/11 19:35:08 UTC

Can 2.3 read indexes created by 2.4?

Lucene 2.4 introduced a change not documented on the File Formats page

*LUCENE-510: The index now stores strings as true UTF-8 bytes (previously it
was Java's modified UTF-8). If any text, either stored fields or a token,
has illegal UTF-16 surrogate characters, these characters are now silently
replaced with the Unicode replacement character U+FFFD. This is a change to
the index file format.*
*(Marvin Humphrey via Mike McCandless)*


Is there a reason this change isn't documened on the File Formats page of
any 2.4+ doc release?

>From the 3.0.1 docs:

*http://lucene.apache.org/java/3_0_1/fileformats.html*<http://lucene.apache.org/java/3_0_1/fileformats.html>
*"**Compatibility notes are provided in this document, describing how file
formats have changed from prior versions.*

*In version 2.1, the file format was changed to allow lock-less commits (ie,
no more commit lock). The change is fully backwards compatible: you can open
a pre-2.1 index for searching or adding/deleting of docs. When the new
segments file is saved (committed), it will be written in the new file
format (meaning no specific "upgrade" process is needed). But note that once
a commit has occurred, pre-2.1 Lucene will not be able to read the index.*

*In version 2.3, the file format was changed to allow segments to share a
single set of doc store (vectors & stored fields) files. This allows for
faster indexing in certain cases. The change is fully backwards compatible
(in the same way as the lock-less commits change in 2.1)."*


But no mention of the unicode change in version 2.4... was it somehow
forwards compatible?

Can I read indexes created in 2.4 with a 2.3 compliant reader?
(Specifically, I'm interested in creating indexes with Lucene 3 and reading
them with CLucene on an iPhone.)

Thanks,

Nathanael

Re: Can 2.3 read indexes created by 2.4?

Posted by Michael McCandless <lu...@mikemccandless.com>.
Urgh, I failed to update the opening in fileformats.html (describing
what's changed on each version).  We also had a change in 3.0, from
removing compressed fields.  I'll fix...

But: 2.3 can't read indexes created with 2.4 (and in general older
Lucene releases very likely will not be able to read indexes created
by newer versions, when there's been an index format change).

Mike

On Thu, Mar 11, 2010 at 1:35 PM, Nathanael D. Jones
<na...@gmail.com> wrote:
> Lucene 2.4 introduced a change not documented on the File Formats page
>
> *LUCENE-510: The index now stores strings as true UTF-8 bytes (previously it
> was Java's modified UTF-8). If any text, either stored fields or a token,
> has illegal UTF-16 surrogate characters, these characters are now silently
> replaced with the Unicode replacement character U+FFFD. This is a change to
> the index file format.*
> *(Marvin Humphrey via Mike McCandless)*
>
>
> Is there a reason this change isn't documened on the File Formats page of
> any 2.4+ doc release?
>
> From the 3.0.1 docs:
>
> *http://lucene.apache.org/java/3_0_1/fileformats.html*<http://lucene.apache.org/java/3_0_1/fileformats.html>
> *"**Compatibility notes are provided in this document, describing how file
> formats have changed from prior versions.*
>
> *In version 2.1, the file format was changed to allow lock-less commits (ie,
> no more commit lock). The change is fully backwards compatible: you can open
> a pre-2.1 index for searching or adding/deleting of docs. When the new
> segments file is saved (committed), it will be written in the new file
> format (meaning no specific "upgrade" process is needed). But note that once
> a commit has occurred, pre-2.1 Lucene will not be able to read the index.*
>
> *In version 2.3, the file format was changed to allow segments to share a
> single set of doc store (vectors & stored fields) files. This allows for
> faster indexing in certain cases. The change is fully backwards compatible
> (in the same way as the lock-less commits change in 2.1)."*
>
>
> But no mention of the unicode change in version 2.4... was it somehow
> forwards compatible?
>
> Can I read indexes created in 2.4 with a 2.3 compliant reader?
> (Specifically, I'm interested in creating indexes with Lucene 3 and reading
> them with CLucene on an iPhone.)
>
> Thanks,
>
> Nathanael
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org