You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Ramprakash Ramamoorthy <yo...@gmail.com> on 2012/12/26 15:51:34 UTC

Negotiating string and byteArray in a migrated index

Dear all,

          We are moving from Lucene 2.3 to 4.1. For the migration, we use
the IndexUpgrader class in org.apache.lucene.index of lucene 3.6. And then
migrate it to 4.0 using the 4.0 IndexUpgrader. [This is done to  use the
CompressingStoredFieldFormat of 4.1, have pulled code from the trunk for
now, space is critical for us!]


          The problem is, we are writing the new index with field values as
a byte[] in lucene 4.1 and we are achieving a good compression in the index
size. When an indexReader reads across both the old and newer index, it
would be doc.get("fieldName") for older ones and
doc.getBinaryValue("fieldName").utf8ToString().


           One in-gracious solution would be using an if/else, or may be
catch an exception and operate accordingly to normalise the string and
byte[]. So my question is, is there anyway where I can convert strings to
byte[] during migration, so that this handling process would be clean. Or
is resorting to string index in 4.1 would be the only solution?

-- 
With Thanks and Regards,
Ramprakash Ramamoorthy,
India,
+91 9626975420.

Re: Negotiating string and byteArray in a migrated index

Posted by Ramprakash Ramamoorthy <yo...@gmail.com>.

Sorry, an edit. Read the below version for better understanding. Sorry
again.

On Wed, Dec 26, 2012 at 8:21 PM, Ramprakash Ramamoorthy <
youngestachiever@gmail.com> wrote:

> Dear all,
>
>           We are moving from Lucene 2.3 to 4.1. For the migration, we use
> the IndexUpgrader class in org.apache.lucene.index of lucene 3.6. And then
> migrate it to 4.0 using the 4.0 IndexUpgrader. [This is done to  use the
> CompressingStoredFieldFormat of 4.1, have pulled code from the trunk for
> now, space is critical for us!]
>
>
>           The problem is, we are writing the new index with field values
> as a byte[] in lucene 4.1 and we are achieving a good compression in the
> index size. When an indexReader reads across both the old and newer index,
> it would be doc.get("fieldName") for older ones and
> doc.getBinaryValue("fieldName").utf8ToString() f*or the newer indices.*
>
>
>            One in-gracious solution would be using an if/else, or may be
> catch an exception and operate accordingly to normalise the string and
> byte[]. So my question is, is there anyway where I can convert strings to
> byte[] during migration, so that this handling process would be clean. Or
> is resorting to string index in 4.1 would be the only solution?
>



-- 
With Thanks and Regards,
Ramprakash Ramamoorthy,
Engineer Trainee,
Zoho Corporation.
+91 9626975420