You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Chen, Pei" <Pe...@childrens.harvard.edu> on 2012/11/07 22:05:26 UTC

Upgrading Lucene Indexes

Did you know...?
No need to rebuild Lucene indexes, we can just use the upgrade tool :) with a single command.  I'll give this a try on the OrangeBook and any other indexes created with older versions of Lucene.
This is an easy-to-use tool that upgrades all segments of an index from previous Lucene versions to the current segment file format. It can be used from command line:
  java -cp lucene-core.jar org.apache.lucene.index.IndexUpgrader [-delete-prior-commits] [-verbose] indexDir


RE: Upgrading Lucene Indexes

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.
This has been done in trunk for the future releases 3.1 and above (all indexes: rxnorm_index, drug_index, snomed_example*, OrangeBook have been updated to 4.0.0) along with some minor code changes in order for this to happen.
Only managed to perform manual testing with the dummy notes. Hopefully, we'll have more test coverage by the time 3.1 is read.
--Pei

> -----Original Message-----
> From: Steven Bethard [mailto:steven.bethard@Colorado.EDU]
> Sent: Thursday, November 08, 2012 12:32 PM
> To: ctakes-dev@incubator.apache.org
> Subject: Re: Upgrading Lucene Indexes
> 
> On Nov 8, 2012, at 5:35 PM, "Wu, Stephen T., Ph.D."
> <Wu...@mayo.edu> wrote:
> > nice! i think lucene 4.0.0 is worth the switch for downstream
> tasks/retrieval, but currently we don't use lucene for much.  given that it
> shouldn't be too much refactoring (james can comment -- he recently did a
> conversion from 3.6.1 to 4.0.0), i vote we go to 4.0.0.
> 
> I'd also vote for 4.0.0 if it's not too much trouble. It's a lot faster and uses less
> memory, among other things:
> 
> http://ostatic.com/blog/guest-post-under-the-hood-in-apache-lucene-4-0
> 
> Steve
> 
> > ________________________________________
> > From: ctakes-dev-return-825-
> Wu.Stephen=mayo.edu@incubator.apache.org [ctakes-dev-return-825-
> Wu.Stephen=mayo.edu@incubator.apache.org] on behalf of Steven Bethard
> [steven.bethard@Colorado.EDU]
> > Sent: Wednesday, November 07, 2012 10:25 PM
> > To: ctakes-dev@incubator.apache.org
> > Subject: Re: Upgrading Lucene Indexes
> >
> > On Nov 7, 2012, at 10:05 PM, "Chen, Pei"
> <Pe...@childrens.harvard.edu> wrote:
> >> No need to rebuild Lucene indexes, we can just use the upgrade tool :)
> with a single command.  I'll give this a try on the OrangeBook and any other
> indexes created with older versions of Lucene.
> >> This is an easy-to-use tool that upgrades all segments of an index from
> previous Lucene versions to the current segment file format. It can be used
> from command line:
> >> java -cp lucene-core.jar org.apache.lucene.index.IndexUpgrader [-
> delete-prior-commits] [-verbose] indexDir
> >
> > Awesome!
> >
> > What version of Lucene are we upgrading to? 3.6.1? That's the latest from
> the 3.X series. The current version is 4.0.0, but that probably involves some
> code updates as well.
> >
> > Keeping up with Lucene releases and API changes is a full time job. ;-)
> >
> > Steve
> >


Re: Upgrading Lucene Indexes

Posted by Steven Bethard <st...@Colorado.EDU>.
On Nov 8, 2012, at 5:35 PM, "Wu, Stephen T., Ph.D." <Wu...@mayo.edu> wrote:
> nice! i think lucene 4.0.0 is worth the switch for downstream tasks/retrieval, but currently we don't use lucene for much.  given that it shouldn't be too much refactoring (james can comment -- he recently did a conversion from 3.6.1 to 4.0.0), i vote we go to 4.0.0.

I'd also vote for 4.0.0 if it's not too much trouble. It's a lot faster and uses less memory, among other things:

http://ostatic.com/blog/guest-post-under-the-hood-in-apache-lucene-4-0

Steve

> ________________________________________
> From: ctakes-dev-return-825-Wu.Stephen=mayo.edu@incubator.apache.org [ctakes-dev-return-825-Wu.Stephen=mayo.edu@incubator.apache.org] on behalf of Steven Bethard [steven.bethard@Colorado.EDU]
> Sent: Wednesday, November 07, 2012 10:25 PM
> To: ctakes-dev@incubator.apache.org
> Subject: Re: Upgrading Lucene Indexes
> 
> On Nov 7, 2012, at 10:05 PM, "Chen, Pei" <Pe...@childrens.harvard.edu> wrote:
>> No need to rebuild Lucene indexes, we can just use the upgrade tool :) with a single command.  I'll give this a try on the OrangeBook and any other indexes created with older versions of Lucene.
>> This is an easy-to-use tool that upgrades all segments of an index from previous Lucene versions to the current segment file format. It can be used from command line:
>> java -cp lucene-core.jar org.apache.lucene.index.IndexUpgrader [-delete-prior-commits] [-verbose] indexDir
> 
> Awesome!
> 
> What version of Lucene are we upgrading to? 3.6.1? That's the latest from the 3.X series. The current version is 4.0.0, but that probably involves some code updates as well.
> 
> Keeping up with Lucene releases and API changes is a full time job. ;-)
> 
> Steve
> 


RE: Upgrading Lucene Indexes

Posted by "Wu, Stephen T., Ph.D." <Wu...@mayo.edu>.
nice! i think lucene 4.0.0 is worth the switch for downstream tasks/retrieval, but currently we don't use lucene for much.  given that it shouldn't be too much refactoring (james can comment -- he recently did a conversion from 3.6.1 to 4.0.0), i vote we go to 4.0.0.

stephen


________________________________________
From: ctakes-dev-return-825-Wu.Stephen=mayo.edu@incubator.apache.org [ctakes-dev-return-825-Wu.Stephen=mayo.edu@incubator.apache.org] on behalf of Steven Bethard [steven.bethard@Colorado.EDU]
Sent: Wednesday, November 07, 2012 10:25 PM
To: ctakes-dev@incubator.apache.org
Subject: Re: Upgrading Lucene Indexes

On Nov 7, 2012, at 10:05 PM, "Chen, Pei" <Pe...@childrens.harvard.edu> wrote:
> No need to rebuild Lucene indexes, we can just use the upgrade tool :) with a single command.  I'll give this a try on the OrangeBook and any other indexes created with older versions of Lucene.
> This is an easy-to-use tool that upgrades all segments of an index from previous Lucene versions to the current segment file format. It can be used from command line:
>  java -cp lucene-core.jar org.apache.lucene.index.IndexUpgrader [-delete-prior-commits] [-verbose] indexDir

Awesome!

What version of Lucene are we upgrading to? 3.6.1? That's the latest from the 3.X series. The current version is 4.0.0, but that probably involves some code updates as well.

Keeping up with Lucene releases and API changes is a full time job. ;-)

Steve


Re: Upgrading Lucene Indexes

Posted by Steven Bethard <st...@Colorado.EDU>.
On Nov 7, 2012, at 10:05 PM, "Chen, Pei" <Pe...@childrens.harvard.edu> wrote:
> No need to rebuild Lucene indexes, we can just use the upgrade tool :) with a single command.  I'll give this a try on the OrangeBook and any other indexes created with older versions of Lucene.
> This is an easy-to-use tool that upgrades all segments of an index from previous Lucene versions to the current segment file format. It can be used from command line:
>  java -cp lucene-core.jar org.apache.lucene.index.IndexUpgrader [-delete-prior-commits] [-verbose] indexDir

Awesome!

What version of Lucene are we upgrading to? 3.6.1? That's the latest from the 3.X series. The current version is 4.0.0, but that probably involves some code updates as well.

Keeping up with Lucene releases and API changes is a full time job. ;-)

Steve