You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2012/05/23 05:15:41 UTC

[Lucene-java Wiki] Update of "ReleaseNote40alpha" by RobertMuir

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The "ReleaseNote40alpha" page has been changed by RobertMuir:
http://wiki.apache.org/lucene-java/ReleaseNote40alpha

Comment:
first stab at a template for 4.0 alpha release notes: please help, but lets keep it concise

New page:
{{{
MMM???? 2012, Apache Luceneā€š 4.0-alpha available
The Lucene PMC is pleased to announce the release of Apache Lucene 4.0-alpha

Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for nearly
any application that requires full-text search, especially cross-platform.

This release contains numerous bug fixes, optimizations, and
improvements, some of which are highlighted below.  The release
is available for immediate download at:
   http://lucene.apache.org/core/mirrors-core-latest-redir.html (see note below).

See the CHANGES.txt file included with the release for a full list of
details.

Lucene 4.0-alpha Release Highlights:

 * The APIs for accessing terms, postings lists, stored fields, term vectors, etc 
   are pluggable via the Codec api. You can select from the provided 
   implementations or customize the index format with your own Codec to meet your needs.

 * Similarity has been decoupled from the vector space model (TF/IDF). Additional models
   such as BM25, Divergence from Randomness, Language Models, and Information-based models
   are provided.

 * Added support for per-document values (DocValues). DocValues can be used for custom 
   scoring factors (accessible via Similarity), for pre-sorted Sort values, and more.

 * When indexing via multiple threads, each IndexWriter thread now flushes its own segment
   to disk concurrently.

 * Per-document normalization factors ("norms") are no longer limited to a single byte.
   Similarity implementations can use any DocValues type to store norms. 

 * Added index statistics such as the number of tokens for a term or field, number of postings
   for a field, and number of documents with a posting for a field: these support additional
   scoring models.

 * Implemented a new default term dictionary/index (BlockTree) that indexes shared prefixes
   instead of every n'th term ; this is not only more time- and space- efficient, but can
   also sometimes avoid going to disk at all for terms that do not exist. Alternative term
   dictionary implementions are provided and pluggable via the Codec api.

 * Added a number of alternative Codecs and components for different use-cases: "Appending"
   works with append-only filesystems (such as Hadoop DFS), "Memory" writes the entire 
   terms+postings as an FST read into RAM, "Pulsing" inlines the postings for low-frequency 
   terms into the term dictionary, "SimpleText" writes all files in plain-text for easy
   debugging/transparency, among others.

 * Term offsets can be optionally encoded into the postings lists and can be retrieved
   per-position.

 * Various in-memory data structures such as the term dictionary and FieldCache are represented
   more efficiently with less object overhead.

 * Lucene 4.0 provides a modular API, consolidating components such as Analyzers and Queries 
   that were previously scattered across Lucene core, contrib, and Solr. These modules also
   include additional functionality such as UIMA analyzer integration and a completely reworked 
   spatial search implementation.

Please read CHANGES.txt and MIGRATE.txt for a full list of new features and notes on upgrading. 
Particularly, the new apis are not compatible with previous version of Lucene, however, file 
format backwards compatibility is provided for indexes from the 3.0 series.

This is an alpha release for early adopters. The guarantee for this alpha release is that the index 
format will be the 4.0 index format, supported through the 5.x series of Apache Lucene, unless there 
is a critical bug (e.g. that would cause index corruption) that would prevent this.

Please report any feedback to the mailing lists (http://lucene.apache.org/core/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network for
distributing releases.  It is possible that the mirror you are using may not
have replicated the release yet.  If that is the case, please try another
mirror.  This also goes for Maven access.
}}}