You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ramprakash Ramamoorthy <yo...@gmail.com> on 2013/08/02 13:49:23 UTC

IndexUpgrade - Any ways to speed up?

Team,

        We are migrating from lucene version 2.3.1 to 4.1. We are migrating
the indices as well, and we do this in two steps 2.3.1 to 3.6.2 and 3.6.2
to 4. We just call IndexUpgrader.upgrade(), using the
IndexUpgraderMergePolicy. I see that, the upgrade() method actually calls a
forcemerge(1) over the indices.

        However, we have all our indices optimized and there are no deletes
as well. This forcemerge(1) seems a very costly operation and since our
index is already optimized, there is no space benefit as well. Is there a
faster way to upgrade our indices (like reading the indices and modifying
the headers, something of that sort)? We are not expecting any compaction
during the process.

         Currently it takes 4 minutes for a GB of index to get migrated to
4.1 from 2.3.1. Any pointers would be appreciated. Thanks in advance.


-- 
With Thanks and Regards,
Ramprakash Ramamoorthy,
Chennai, India.

Re: IndexUpgrade - Any ways to speed up?

Posted by Ramprakash Ramamoorthy <yo...@gmail.com>.
On Fri, Aug 2, 2013 at 5:56 PM, Shai Erera <se...@gmail.com> wrote:

> Unfortunately you cannot upgrade directly from 2.3.1 to 4.1.
>
> You can consider upgrading to 3.6.2 and stop there. Lucene 4.1 can read 3.x
> indexes, and when segments will are merged, they are upgraded automatically
> to the newest file format.
> However, if this single segment is too big, such that it won't be picked
> for merges, you will need to upgrade it anyway when one day you will
> upgrade to Lucene 5.0.
> So I'd say, if you're not stressed with time, upgrade to 4.1 now ... it's a
> one time process.
>

Thank you Shai, doing it right away :) Staying with an older version of
lucene for a longer period of time has been a bad idea.

>
> Shai
>
>
> On Fri, Aug 2, 2013 at 3:22 PM, Ramprakash Ramamoorthy <
> youngestachiever@gmail.com> wrote:
>
> > Thank you Shai for the quick response. Have responded inline.
> >
> >
> > On Fri, Aug 2, 2013 at 5:37 PM, Shai Erera <se...@gmail.com> wrote:
> >
> > > Hi
> > >
> > > You cannot just update headers -- the file formats have changed.
> > Therefore
> > > you need to rewrite the index entirely, at least from 2.3.1 to 3.6.2
> (for
> > > 4.1 to be able to read it).
> > >
> > Yeah, as of now, we call IndexUpgrader of 3.6.2 and then IndexUpgrader of
> > 4.0, and then the indices become readable by 4.1
> >
> > > If your index is already optimized, then IndexUpgrader is your best
> > option.
> > > The reason it calls forceMerge(1) is that it needs to guarantee *every*
> > > segment in your index gets rewritten.
> > >
> > Understood. Looks like we will have to stick to what we have written as
> on
> > date.
> >
> > >
> > > BTW, you might want to upgrade to 4.4 already.
> > >
> > Yeah, we upgraded the code base when 4.1 was the most recent version, now
> > that we are looking forward to migrate the older indices to be
> compatible.
> > Thanks again.
> >
> > >
> > > Shai
> > >
> > >
> > > On Fri, Aug 2, 2013 at 2:49 PM, Ramprakash Ramamoorthy <
> > > youngestachiever@gmail.com> wrote:
> > >
> > > > Team,
> > > >
> > > >         We are migrating from lucene version 2.3.1 to 4.1. We are
> > > migrating
> > > > the indices as well, and we do this in two steps 2.3.1 to 3.6.2 and
> > 3.6.2
> > > > to 4. We just call IndexUpgrader.upgrade(), using the
> > > > IndexUpgraderMergePolicy. I see that, the upgrade() method actually
> > > calls a
> > > > forcemerge(1) over the indices.
> > > >
> > > >         However, we have all our indices optimized and there are no
> > > deletes
> > > > as well. This forcemerge(1) seems a very costly operation and since
> our
> > > > index is already optimized, there is no space benefit as well. Is
> > there a
> > > > faster way to upgrade our indices (like reading the indices and
> > modifying
> > > > the headers, something of that sort)? We are not expecting any
> > compaction
> > > > during the process.
> > > >
> > > >          Currently it takes 4 minutes for a GB of index to get
> migrated
> > > to
> > > > 4.1 from 2.3.1. Any pointers would be appreciated. Thanks in advance.
> > > >
> > > >
> > > > --
> > > > With Thanks and Regards,
> > > > Ramprakash Ramamoorthy,
> > > > Chennai, India.
> > > >
> > >
> >
> >
> >
> > --
> > With Thanks and Regards,
> > Ramprakash Ramamoorthy,
> > Chennai, India.
> >
>



-- 
With Thanks and Regards,
Ramprakash Ramamoorthy,
Chennai, India

Re: IndexUpgrade - Any ways to speed up?

Posted by Shai Erera <se...@gmail.com>.
Unfortunately you cannot upgrade directly from 2.3.1 to 4.1.

You can consider upgrading to 3.6.2 and stop there. Lucene 4.1 can read 3.x
indexes, and when segments will are merged, they are upgraded automatically
to the newest file format.
However, if this single segment is too big, such that it won't be picked
for merges, you will need to upgrade it anyway when one day you will
upgrade to Lucene 5.0.
So I'd say, if you're not stressed with time, upgrade to 4.1 now ... it's a
one time process.

Shai


On Fri, Aug 2, 2013 at 3:22 PM, Ramprakash Ramamoorthy <
youngestachiever@gmail.com> wrote:

> Thank you Shai for the quick response. Have responded inline.
>
>
> On Fri, Aug 2, 2013 at 5:37 PM, Shai Erera <se...@gmail.com> wrote:
>
> > Hi
> >
> > You cannot just update headers -- the file formats have changed.
> Therefore
> > you need to rewrite the index entirely, at least from 2.3.1 to 3.6.2 (for
> > 4.1 to be able to read it).
> >
> Yeah, as of now, we call IndexUpgrader of 3.6.2 and then IndexUpgrader of
> 4.0, and then the indices become readable by 4.1
>
> > If your index is already optimized, then IndexUpgrader is your best
> option.
> > The reason it calls forceMerge(1) is that it needs to guarantee *every*
> > segment in your index gets rewritten.
> >
> Understood. Looks like we will have to stick to what we have written as on
> date.
>
> >
> > BTW, you might want to upgrade to 4.4 already.
> >
> Yeah, we upgraded the code base when 4.1 was the most recent version, now
> that we are looking forward to migrate the older indices to be compatible.
> Thanks again.
>
> >
> > Shai
> >
> >
> > On Fri, Aug 2, 2013 at 2:49 PM, Ramprakash Ramamoorthy <
> > youngestachiever@gmail.com> wrote:
> >
> > > Team,
> > >
> > >         We are migrating from lucene version 2.3.1 to 4.1. We are
> > migrating
> > > the indices as well, and we do this in two steps 2.3.1 to 3.6.2 and
> 3.6.2
> > > to 4. We just call IndexUpgrader.upgrade(), using the
> > > IndexUpgraderMergePolicy. I see that, the upgrade() method actually
> > calls a
> > > forcemerge(1) over the indices.
> > >
> > >         However, we have all our indices optimized and there are no
> > deletes
> > > as well. This forcemerge(1) seems a very costly operation and since our
> > > index is already optimized, there is no space benefit as well. Is
> there a
> > > faster way to upgrade our indices (like reading the indices and
> modifying
> > > the headers, something of that sort)? We are not expecting any
> compaction
> > > during the process.
> > >
> > >          Currently it takes 4 minutes for a GB of index to get migrated
> > to
> > > 4.1 from 2.3.1. Any pointers would be appreciated. Thanks in advance.
> > >
> > >
> > > --
> > > With Thanks and Regards,
> > > Ramprakash Ramamoorthy,
> > > Chennai, India.
> > >
> >
>
>
>
> --
> With Thanks and Regards,
> Ramprakash Ramamoorthy,
> Chennai, India.
>

Re: IndexUpgrade - Any ways to speed up?

Posted by Ramprakash Ramamoorthy <yo...@gmail.com>.
Thank you Shai for the quick response. Have responded inline.


On Fri, Aug 2, 2013 at 5:37 PM, Shai Erera <se...@gmail.com> wrote:

> Hi
>
> You cannot just update headers -- the file formats have changed. Therefore
> you need to rewrite the index entirely, at least from 2.3.1 to 3.6.2 (for
> 4.1 to be able to read it).
>
Yeah, as of now, we call IndexUpgrader of 3.6.2 and then IndexUpgrader of
4.0, and then the indices become readable by 4.1

> If your index is already optimized, then IndexUpgrader is your best option.
> The reason it calls forceMerge(1) is that it needs to guarantee *every*
> segment in your index gets rewritten.
>
Understood. Looks like we will have to stick to what we have written as on
date.

>
> BTW, you might want to upgrade to 4.4 already.
>
Yeah, we upgraded the code base when 4.1 was the most recent version, now
that we are looking forward to migrate the older indices to be compatible.
Thanks again.

>
> Shai
>
>
> On Fri, Aug 2, 2013 at 2:49 PM, Ramprakash Ramamoorthy <
> youngestachiever@gmail.com> wrote:
>
> > Team,
> >
> >         We are migrating from lucene version 2.3.1 to 4.1. We are
> migrating
> > the indices as well, and we do this in two steps 2.3.1 to 3.6.2 and 3.6.2
> > to 4. We just call IndexUpgrader.upgrade(), using the
> > IndexUpgraderMergePolicy. I see that, the upgrade() method actually
> calls a
> > forcemerge(1) over the indices.
> >
> >         However, we have all our indices optimized and there are no
> deletes
> > as well. This forcemerge(1) seems a very costly operation and since our
> > index is already optimized, there is no space benefit as well. Is there a
> > faster way to upgrade our indices (like reading the indices and modifying
> > the headers, something of that sort)? We are not expecting any compaction
> > during the process.
> >
> >          Currently it takes 4 minutes for a GB of index to get migrated
> to
> > 4.1 from 2.3.1. Any pointers would be appreciated. Thanks in advance.
> >
> >
> > --
> > With Thanks and Regards,
> > Ramprakash Ramamoorthy,
> > Chennai, India.
> >
>



-- 
With Thanks and Regards,
Ramprakash Ramamoorthy,
Chennai, India.

Re: IndexUpgrade - Any ways to speed up?

Posted by Shai Erera <se...@gmail.com>.
Hi

You cannot just update headers -- the file formats have changed. Therefore
you need to rewrite the index entirely, at least from 2.3.1 to 3.6.2 (for
4.1 to be able to read it).
If your index is already optimized, then IndexUpgrader is your best option.
The reason it calls forceMerge(1) is that it needs to guarantee *every*
segment in your index gets rewritten.

BTW, you might want to upgrade to 4.4 already.

Shai


On Fri, Aug 2, 2013 at 2:49 PM, Ramprakash Ramamoorthy <
youngestachiever@gmail.com> wrote:

> Team,
>
>         We are migrating from lucene version 2.3.1 to 4.1. We are migrating
> the indices as well, and we do this in two steps 2.3.1 to 3.6.2 and 3.6.2
> to 4. We just call IndexUpgrader.upgrade(), using the
> IndexUpgraderMergePolicy. I see that, the upgrade() method actually calls a
> forcemerge(1) over the indices.
>
>         However, we have all our indices optimized and there are no deletes
> as well. This forcemerge(1) seems a very costly operation and since our
> index is already optimized, there is no space benefit as well. Is there a
> faster way to upgrade our indices (like reading the indices and modifying
> the headers, something of that sort)? We are not expecting any compaction
> during the process.
>
>          Currently it takes 4 minutes for a GB of index to get migrated to
> 4.1 from 2.3.1. Any pointers would be appreciated. Thanks in advance.
>
>
> --
> With Thanks and Regards,
> Ramprakash Ramamoorthy,
> Chennai, India.
>