You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@lucene.apache.org by Anshum Gupta <an...@anshumgupta.net> on 2015/02/20 21:54:44 UTC

[ANNOUNCE] Apache Lucene 5.0.0 released

20 February 2015, Apache Lucene™ 5.0.0 available

The Lucene PMC is pleased to announce the release of Apache Lucene 5.0.

Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for nearly
any application that requires full-text search, especially cross-platform.

This release contains numerous bug fixes, optimizations, and
improvements, some of which are highlighted below. The release
is available for immediate download at:
  http://lucene.apache.org/core/mirrors-core-latest-redir.html

See the CHANGES.txt file included with the release for a full list of
details.

Lucene 5.0 Release Highlights:

Stronger index safety

 * All file access now uses Java’s NIO.2 APIs which give Lucene stronger
index safety in terms of better error handling and safer commits.

 * Every Lucene segment now stores a unique id per-segment and per-commit
to aid in accurate replication of index files.

 * During merging, IndexWriter now always checks the incoming segments for
corruption before merging. This can mean, on upgrading to 5.0.0, that
merging may uncover long-standing latent corruption in an older 4.x index.

Reduced heap usage

 * Lucene now supports random-writable and advance-able sparse bitsets
(RoaringDocIdSet and SparseFixedBitSet), so the heap required is in
proportion to how many bits are set, not how many total documents exist in
the index.

 * Heap usage during IndexWriter merging is also much lower with the new
Lucene50Codec, since doc values and norms for the segments being merged are
no longer fully loaded into heap for all fields; now they are loaded for
the one field currently being merged, and then dropped.

 * The default norms format now uses sparse encoding when appropriate, so
indices that enable norms for many sparse fields will see a large reduction
in required heap at search time.

 * 5.0 has a new API to print a tree structure showing a recursive
breakdown of which parts are using how much heap.

Other features

 * FieldCache is gone (moved to a dedicated UninvertingReader in the misc
module). This means when you intend to sort on a field, you should index
that field using doc values, which is much faster and less heap consuming
than FieldCache.

 * Tokenizers and Analyzers no longer require Reader on init.

 * NormsFormat now gets its own dedicated NormsConsumer/Producer

 * SortedSetSortField, used to sort on a multi-valued field, is promoted
from sandbox to Lucene's core.

 * PostingsFormat now uses a "pull" API when writing postings, just like
doc values. This is powerful because you can do things in your postings
format that require making more than one pass through the postings such as
iterating over all postings for each term to decide which compression
format it should use.

 * New DateRangeField type enables Indexing and searching of date ranges,
particularly multi-valued ones.

 * A new ExitableDirectoryReader extends FilterDirectoryReader and enables
exiting requests that take too long to enumerate over terms.

 * Suggesters from multi-valued field can now be built as
DocumentDictionary now enumerates each value separately in a multi-valued
field.

 * ConcurrentMergeScheduler detects whether the index is on SSD or not and
does a better job defaulting its settings. This only works on Linux for
now; other OS's will continue to use the previous defaults (tuned for
spinning disks).

 * Auto-IO-throttling has been added to ConcurrentMergeScheduler, to rate
limit IO writes for each merge depending on incoming merge rate.

 * CustomAnalyzer has been added that allows to configure analyzers like
you do in Solr's index schema. This class has a builder API to configure
Tokenizers, TokenFilters, and CharFilters based on their SPI names and
parameters as documented by the corresponding factories.

 * Memory index now supports payloads.

 * Added a filter cache with a usage tracking policy that caches filters
based on frequency of use.

 * The default codec has an option to control BEST_SPEED or
BEST_COMPRESSION for stored fields.

 * Stored fields are merged more efficiently, especially when upgrading
from previous versions or using SortingMergePolicy

NOTE: Lucene 5 no longer supports the Lucene 3.x index format. Opening
indexes will result in IndexFormatTooOldException. It is recommended to
either reindex all your data, or upgrade the old indexes with the
IndexUpgrader tool of latest Lucene 4 version (4.10.x). Those indexes can
then be read (see next section) with Lucene 5.

To read more about the changes, also see:
http://blog.mikemccandless.com/2014/11/apache-lucene-500-is-coming.html

Please read CHANGES.txt (
https://lucene.apache.org/core/5_0_0/changes/Changes.html) and MIGRATE.txt
for a full list of new features and notes on upgrading.

Please report any feedback to the mailing lists (
http://lucene.apache.org/core/discussion.html)

-- 
Anshum Gupta
http://about.me/anshumgupta

Re: [ANNOUNCE] Apache Lucene 5.0.0 released

Posted by Dawid Weiss <da...@gmail.com>.

Thanks for contributing time to the release, Anshum.

Dawid

On Fri, Feb 20, 2015 at 10:16 PM, Anshum Gupta <an...@anshumgupta.net> wrote:
> Sure, I'll fix that on the wiki. Thanks for pointing that out Uwe.
>
> On Fri, Feb 20, 2015 at 1:10 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
>> Many thanks! :-) Nice work!
>>
>> I found a small typo in the announcement text on the mail and web page: "
>> Those indexes can then be read (see next section) with Lucene 5..."
>> The "see next section" should not be there, it's only relevant in the
>> migration guide (because there is a section following). Maybe fix this on
>> the web page, for the mail it's too late.
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>> > -----Original Message-----
>> > From: Anshum Gupta [mailto:anshum@anshumgupta.net]
>> > Sent: Friday, February 20, 2015 9:55 PM
>> > To: dev@lucene.apache.org; general@lucene.apache.org; java-
>> > user@lucene.apache.org
>> > Subject: [ANNOUNCE] Apache Lucene 5.0.0 released
>> >
>> > 20 February 2015, Apache Lucene™ 5.0.0 available
>> >
>> > The Lucene PMC is pleased to announce the release of Apache Lucene 5.0.
>> >
>> > Apache Lucene is a high-performance, full-featured text search engine
>> > library written entirely in Java. It is a technology suitable for nearly
>> any
>> > application that requires full-text search, especially cross-platform.
>> >
>> > This release contains numerous bug fixes, optimizations, and
>> improvements,
>> > some of which are highlighted below. The release is available for
>> immediate
>> > download at:
>> >   http://lucene.apache.org/core/mirrors-core-latest-redir.html
>> >
>> > See the CHANGES.txt file included with the release for a full list of
>> details.
>> >
>> > Lucene 5.0 Release Highlights:
>> >
>> > Stronger index safety
>> >
>> >  * All file access now uses Java’s NIO.2 APIs which give Lucene stronger
>> index
>> > safety in terms of better error handling and safer commits.
>> >
>> >  * Every Lucene segment now stores a unique id per-segment and per-
>> > commit to aid in accurate replication of index files.
>> >
>> >  * During merging, IndexWriter now always checks the incoming segments
>> > for corruption before merging. This can mean, on upgrading to 5.0.0, that
>> > merging may uncover long-standing latent corruption in an older 4.x
>> index.
>> >
>> > Reduced heap usage
>> >
>> >  * Lucene now supports random-writable and advance-able sparse bitsets
>> > (RoaringDocIdSet and SparseFixedBitSet), so the heap required is in
>> > proportion to how many bits are set, not how many total documents exist
>> in
>> > the index.
>> >
>> >  * Heap usage during IndexWriter merging is also much lower with the new
>> > Lucene50Codec, since doc values and norms for the segments being merged
>> > are no longer fully loaded into heap for all fields; now they are loaded
>> for the
>> > one field currently being merged, and then dropped.
>> >
>> >  * The default norms format now uses sparse encoding when appropriate, so
>> > indices that enable norms for many sparse fields will see a large
>> reduction in
>> > required heap at search time.
>> >
>> >  * 5.0 has a new API to print a tree structure showing a recursive
>> breakdown
>> > of which parts are using how much heap.
>> >
>> > Other features
>> >
>> >  * FieldCache is gone (moved to a dedicated UninvertingReader in the misc
>> > module). This means when you intend to sort on a field, you should index
>> > that field using doc values, which is much faster and less heap consuming
>> > than FieldCache.
>> >
>> >  * Tokenizers and Analyzers no longer require Reader on init.
>> >
>> >  * NormsFormat now gets its own dedicated NormsConsumer/Producer
>> >
>> >  * SortedSetSortField, used to sort on a multi-valued field, is promoted
>> from
>> > sandbox to Lucene's core.
>> >
>> >  * PostingsFormat now uses a "pull" API when writing postings, just like
>> doc
>> > values. This is powerful because you can do things in your postings
>> format
>> > that require making more than one pass through the postings such as
>> > iterating over all postings for each term to decide which compression
>> format
>> > it should use.
>> >
>> >  * New DateRangeField type enables Indexing and searching of date ranges,
>> > particularly multi-valued ones.
>> >
>> >  * A new ExitableDirectoryReader extends FilterDirectoryReader and
>> enables
>> > exiting requests that take too long to enumerate over terms.
>> >
>> >  * Suggesters from multi-valued field can now be built as
>> > DocumentDictionary now enumerates each value separately in a multi-
>> > valued field.
>> >
>> >  * ConcurrentMergeScheduler detects whether the index is on SSD or not
>> > and does a better job defaulting its settings. This only works on Linux
>> for
>> > now; other OS's will continue to use the previous defaults (tuned for
>> > spinning disks).
>> >
>> >  * Auto-IO-throttling has been added to ConcurrentMergeScheduler, to rate
>> > limit IO writes for each merge depending on incoming merge rate.
>> >
>> >  * CustomAnalyzer has been added that allows to configure analyzers like
>> > you do in Solr's index schema. This class has a builder API to configure
>> > Tokenizers, TokenFilters, and CharFilters based on their SPI names and
>> > parameters as documented by the corresponding factories.
>> >
>> >  * Memory index now supports payloads.
>> >
>> >  * Added a filter cache with a usage tracking policy that caches filters
>> based
>> > on frequency of use.
>> >
>> >  * The default codec has an option to control BEST_SPEED or
>> > BEST_COMPRESSION for stored fields.
>> >
>> >  * Stored fields are merged more efficiently, especially when upgrading
>> from
>> > previous versions or using SortingMergePolicy
>> >
>> > NOTE: Lucene 5 no longer supports the Lucene 3.x index format. Opening
>> > indexes will result in IndexFormatTooOldException. It is recommended to
>> > either reindex all your data, or upgrade the old indexes with the
>> > IndexUpgrader tool of latest Lucene 4 version (4.10.x). Those indexes can
>> > then be read (see next section) with Lucene 5.
>> >
>> > To read more about the changes, also see:
>> > http://blog.mikemccandless.com/2014/11/apache-lucene-500-is-
>> > coming.html
>> >
>> > Please read CHANGES.txt (
>> > https://lucene.apache.org/core/5_0_0/changes/Changes.html) and
>> > MIGRATE.txt for a full list of new features and notes on upgrading.
>> >
>> > Please report any feedback to the mailing lists (
>> > http://lucene.apache.org/core/discussion.html)
>> >
>> > --
>> > Anshum Gupta
>> > http://about.me/anshumgupta
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> --
> Anshum Gupta
> http://about.me/anshumgupta

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: [ANNOUNCE] Apache Lucene 5.0.0 released

Posted by Anshum Gupta <an...@anshumgupta.net>.

Sure, I'll fix that on the wiki. Thanks for pointing that out Uwe.

On Fri, Feb 20, 2015 at 1:10 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

> Many thanks! :-) Nice work!
>
> I found a small typo in the announcement text on the mail and web page: "
> Those indexes can then be read (see next section) with Lucene 5..."
> The "see next section" should not be there, it's only relevant in the
> migration guide (because there is a section following). Maybe fix this on
> the web page, for the mail it's too late.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Anshum Gupta [mailto:anshum@anshumgupta.net]
> > Sent: Friday, February 20, 2015 9:55 PM
> > To: dev@lucene.apache.org; general@lucene.apache.org; java-
> > user@lucene.apache.org
> > Subject: [ANNOUNCE] Apache Lucene 5.0.0 released
> >
> > 20 February 2015, Apache Lucene™ 5.0.0 available
> >
> > The Lucene PMC is pleased to announce the release of Apache Lucene 5.0.
> >
> > Apache Lucene is a high-performance, full-featured text search engine
> > library written entirely in Java. It is a technology suitable for nearly
> any
> > application that requires full-text search, especially cross-platform.
> >
> > This release contains numerous bug fixes, optimizations, and
> improvements,
> > some of which are highlighted below. The release is available for
> immediate
> > download at:
> >   http://lucene.apache.org/core/mirrors-core-latest-redir.html
> >
> > See the CHANGES.txt file included with the release for a full list of
> details.
> >
> > Lucene 5.0 Release Highlights:
> >
> > Stronger index safety
> >
> >  * All file access now uses Java’s NIO.2 APIs which give Lucene stronger
> index
> > safety in terms of better error handling and safer commits.
> >
> >  * Every Lucene segment now stores a unique id per-segment and per-
> > commit to aid in accurate replication of index files.
> >
> >  * During merging, IndexWriter now always checks the incoming segments
> > for corruption before merging. This can mean, on upgrading to 5.0.0, that
> > merging may uncover long-standing latent corruption in an older 4.x
> index.
> >
> > Reduced heap usage
> >
> >  * Lucene now supports random-writable and advance-able sparse bitsets
> > (RoaringDocIdSet and SparseFixedBitSet), so the heap required is in
> > proportion to how many bits are set, not how many total documents exist
> in
> > the index.
> >
> >  * Heap usage during IndexWriter merging is also much lower with the new
> > Lucene50Codec, since doc values and norms for the segments being merged
> > are no longer fully loaded into heap for all fields; now they are loaded
> for the
> > one field currently being merged, and then dropped.
> >
> >  * The default norms format now uses sparse encoding when appropriate, so
> > indices that enable norms for many sparse fields will see a large
> reduction in
> > required heap at search time.
> >
> >  * 5.0 has a new API to print a tree structure showing a recursive
> breakdown
> > of which parts are using how much heap.
> >
> > Other features
> >
> >  * FieldCache is gone (moved to a dedicated UninvertingReader in the misc
> > module). This means when you intend to sort on a field, you should index
> > that field using doc values, which is much faster and less heap consuming
> > than FieldCache.
> >
> >  * Tokenizers and Analyzers no longer require Reader on init.
> >
> >  * NormsFormat now gets its own dedicated NormsConsumer/Producer
> >
> >  * SortedSetSortField, used to sort on a multi-valued field, is promoted
> from
> > sandbox to Lucene's core.
> >
> >  * PostingsFormat now uses a "pull" API when writing postings, just like
> doc
> > values. This is powerful because you can do things in your postings
> format
> > that require making more than one pass through the postings such as
> > iterating over all postings for each term to decide which compression
> format
> > it should use.
> >
> >  * New DateRangeField type enables Indexing and searching of date ranges,
> > particularly multi-valued ones.
> >
> >  * A new ExitableDirectoryReader extends FilterDirectoryReader and
> enables
> > exiting requests that take too long to enumerate over terms.
> >
> >  * Suggesters from multi-valued field can now be built as
> > DocumentDictionary now enumerates each value separately in a multi-
> > valued field.
> >
> >  * ConcurrentMergeScheduler detects whether the index is on SSD or not
> > and does a better job defaulting its settings. This only works on Linux
> for
> > now; other OS's will continue to use the previous defaults (tuned for
> > spinning disks).
> >
> >  * Auto-IO-throttling has been added to ConcurrentMergeScheduler, to rate
> > limit IO writes for each merge depending on incoming merge rate.
> >
> >  * CustomAnalyzer has been added that allows to configure analyzers like
> > you do in Solr's index schema. This class has a builder API to configure
> > Tokenizers, TokenFilters, and CharFilters based on their SPI names and
> > parameters as documented by the corresponding factories.
> >
> >  * Memory index now supports payloads.
> >
> >  * Added a filter cache with a usage tracking policy that caches filters
> based
> > on frequency of use.
> >
> >  * The default codec has an option to control BEST_SPEED or
> > BEST_COMPRESSION for stored fields.
> >
> >  * Stored fields are merged more efficiently, especially when upgrading
> from
> > previous versions or using SortingMergePolicy
> >
> > NOTE: Lucene 5 no longer supports the Lucene 3.x index format. Opening
> > indexes will result in IndexFormatTooOldException. It is recommended to
> > either reindex all your data, or upgrade the old indexes with the
> > IndexUpgrader tool of latest Lucene 4 version (4.10.x). Those indexes can
> > then be read (see next section) with Lucene 5.
> >
> > To read more about the changes, also see:
> > http://blog.mikemccandless.com/2014/11/apache-lucene-500-is-
> > coming.html
> >
> > Please read CHANGES.txt (
> > https://lucene.apache.org/core/5_0_0/changes/Changes.html) and
> > MIGRATE.txt for a full list of new features and notes on upgrading.
> >
> > Please report any feedback to the mailing lists (
> > http://lucene.apache.org/core/discussion.html)
> >
> > --
> > Anshum Gupta
> > http://about.me/anshumgupta
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Anshum Gupta
http://about.me/anshumgupta

RE: [ANNOUNCE] Apache Lucene 5.0.0 released

Posted by Uwe Schindler <uw...@thetaphi.de>.

Many thanks! :-) Nice work!

I found a small typo in the announcement text on the mail and web page: " Those indexes can then be read (see next section) with Lucene 5..."
The "see next section" should not be there, it's only relevant in the migration guide (because there is a section following). Maybe fix this on the web page, for the mail it's too late.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Anshum Gupta [mailto:anshum@anshumgupta.net]
> Sent: Friday, February 20, 2015 9:55 PM
> To: dev@lucene.apache.org; general@lucene.apache.org; java-
> user@lucene.apache.org
> Subject: [ANNOUNCE] Apache Lucene 5.0.0 released
> 
> 20 February 2015, Apache Lucene™ 5.0.0 available
> 
> The Lucene PMC is pleased to announce the release of Apache Lucene 5.0.
> 
> Apache Lucene is a high-performance, full-featured text search engine
> library written entirely in Java. It is a technology suitable for nearly any
> application that requires full-text search, especially cross-platform.
> 
> This release contains numerous bug fixes, optimizations, and improvements,
> some of which are highlighted below. The release is available for immediate
> download at:
>   http://lucene.apache.org/core/mirrors-core-latest-redir.html
> 
> See the CHANGES.txt file included with the release for a full list of details.
> 
> Lucene 5.0 Release Highlights:
> 
> Stronger index safety
> 
>  * All file access now uses Java’s NIO.2 APIs which give Lucene stronger index
> safety in terms of better error handling and safer commits.
> 
>  * Every Lucene segment now stores a unique id per-segment and per-
> commit to aid in accurate replication of index files.
> 
>  * During merging, IndexWriter now always checks the incoming segments
> for corruption before merging. This can mean, on upgrading to 5.0.0, that
> merging may uncover long-standing latent corruption in an older 4.x index.
> 
> Reduced heap usage
> 
>  * Lucene now supports random-writable and advance-able sparse bitsets
> (RoaringDocIdSet and SparseFixedBitSet), so the heap required is in
> proportion to how many bits are set, not how many total documents exist in
> the index.
> 
>  * Heap usage during IndexWriter merging is also much lower with the new
> Lucene50Codec, since doc values and norms for the segments being merged
> are no longer fully loaded into heap for all fields; now they are loaded for the
> one field currently being merged, and then dropped.
> 
>  * The default norms format now uses sparse encoding when appropriate, so
> indices that enable norms for many sparse fields will see a large reduction in
> required heap at search time.
> 
>  * 5.0 has a new API to print a tree structure showing a recursive breakdown
> of which parts are using how much heap.
> 
> Other features
> 
>  * FieldCache is gone (moved to a dedicated UninvertingReader in the misc
> module). This means when you intend to sort on a field, you should index
> that field using doc values, which is much faster and less heap consuming
> than FieldCache.
> 
>  * Tokenizers and Analyzers no longer require Reader on init.
> 
>  * NormsFormat now gets its own dedicated NormsConsumer/Producer
> 
>  * SortedSetSortField, used to sort on a multi-valued field, is promoted from
> sandbox to Lucene's core.
> 
>  * PostingsFormat now uses a "pull" API when writing postings, just like doc
> values. This is powerful because you can do things in your postings format
> that require making more than one pass through the postings such as
> iterating over all postings for each term to decide which compression format
> it should use.
> 
>  * New DateRangeField type enables Indexing and searching of date ranges,
> particularly multi-valued ones.
> 
>  * A new ExitableDirectoryReader extends FilterDirectoryReader and enables
> exiting requests that take too long to enumerate over terms.
> 
>  * Suggesters from multi-valued field can now be built as
> DocumentDictionary now enumerates each value separately in a multi-
> valued field.
> 
>  * ConcurrentMergeScheduler detects whether the index is on SSD or not
> and does a better job defaulting its settings. This only works on Linux for
> now; other OS's will continue to use the previous defaults (tuned for
> spinning disks).
> 
>  * Auto-IO-throttling has been added to ConcurrentMergeScheduler, to rate
> limit IO writes for each merge depending on incoming merge rate.
> 
>  * CustomAnalyzer has been added that allows to configure analyzers like
> you do in Solr's index schema. This class has a builder API to configure
> Tokenizers, TokenFilters, and CharFilters based on their SPI names and
> parameters as documented by the corresponding factories.
> 
>  * Memory index now supports payloads.
> 
>  * Added a filter cache with a usage tracking policy that caches filters based
> on frequency of use.
> 
>  * The default codec has an option to control BEST_SPEED or
> BEST_COMPRESSION for stored fields.
> 
>  * Stored fields are merged more efficiently, especially when upgrading from
> previous versions or using SortingMergePolicy
> 
> NOTE: Lucene 5 no longer supports the Lucene 3.x index format. Opening
> indexes will result in IndexFormatTooOldException. It is recommended to
> either reindex all your data, or upgrade the old indexes with the
> IndexUpgrader tool of latest Lucene 4 version (4.10.x). Those indexes can
> then be read (see next section) with Lucene 5.
> 
> To read more about the changes, also see:
> http://blog.mikemccandless.com/2014/11/apache-lucene-500-is-
> coming.html
> 
> Please read CHANGES.txt (
> https://lucene.apache.org/core/5_0_0/changes/Changes.html) and
> MIGRATE.txt for a full list of new features and notes on upgrading.
> 
> Please report any feedback to the mailing lists (
> http://lucene.apache.org/core/discussion.html)
> 
> --
> Anshum Gupta
> http://about.me/anshumgupta


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org