You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by us...@apache.org on 2010/05/04 14:04:08 UTC
svn commit: r940818 - /lucene/dev/trunk/lucene/CHANGES.txt
Author: uschindler
Date: Tue May 4 12:04:08 2010
New Revision: 940818
URL: http://svn.apache.org/viewvc?rev=940818&view=rev
Log:
split changes for 3.x and trunk
Modified:
lucene/dev/trunk/lucene/CHANGES.txt
Modified: lucene/dev/trunk/lucene/CHANGES.txt
URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/CHANGES.txt?rev=940818&r1=940817&r2=940818&view=diff
==============================================================================
--- lucene/dev/trunk/lucene/CHANGES.txt (original)
+++ lucene/dev/trunk/lucene/CHANGES.txt Tue May 4 12:04:08 2010
@@ -70,6 +70,118 @@ Changes in backwards compatibility polic
TermAttribute/CharTermAttribute. If you want to further filter
or attach Payloads to NTS, use the new NumericTermAttribute.
+* LUCENE-2386: IndexWriter no longer performs an empty commit upon new index
+ creation. Previously, if you passed an empty Directory and set OpenMode to
+ CREATE*, IndexWriter would make a first empty commit. If you need that
+ behavior you can call writer.commit()/close() immediately after you create it.
+ (Shai Erera, Mike McCandless)
+
+* LUCENE-2316: Directory.fileLength contract was clarified - it returns the
+ actual file's length if the file exists, and throws FileNotFoundException
+ otherwise. Returning length=0 for a non-existent file is no longer allowed. If
+ you relied on that, make sure to catch the exception. (Shai Erera)
+
+* LUCENE-2265: FuzzyQuery and WildcardQuery now operate on Unicode codepoints,
+ not unicode code units. For example, a Wildcard "?" represents any unicode
+ character. Furthermore, the rest of the automaton package and RegexpQuery use
+ true Unicode codepoint representation. (Robert Muir, Mike McCandless)
+
+Changes in runtime behavior
+
+* LUCENE-2421: NativeFSLockFactory does not throw LockReleaseFailedException if
+ it cannot delete the lock file, since obtaining the lock does not fail if the
+ file is there. (Shai Erera)
+
+API Changes
+
+* LUCENE-1458, LUCENE-2111: The postings APIs (TermEnum, TermDocsEnum,
+ TermPositionsEnum) have been deprecated in favor of the new flexible
+ indexing (flex) APIs (Fields, FieldsEnum, Terms, TermsEnum,
+ DocsEnum, DocsAndPositionsEnum). One big difference is that field
+ and terms are now enumerated separately: a TermsEnum provides a
+ BytesRef (wraps a byte[]) per term within a single field, not a
+ Term. Another is that when asking for a Docs/AndPositionsEnum, you
+ now specify the skipDocs explicitly (typically this will be the
+ deleted docs, but in general you can provide any Bits).
+
+* LUCENE-1458, LUCENE-2111: IndexReader now directly exposes its
+ deleted docs (getDeletedDocs), providing a new Bits interface to
+ directly query by doc ID.
+
+* LUCENE-2402: IndexWriter.deleteUnusedFiles now deletes unreferenced commit
+ points too. If you use an IndexDeletionPolicy which holds onto index commits
+ (such as SnapshotDeletionPolicy), you can call this method to remove those
+ commit points when they are not needed anymore (instead of waiting for the
+ next commit). (Shai Erera)
+
+New features
+
+* LUCENE-1606, LUCENE-2089: Adds AutomatonQuery, a MultiTermQuery that
+ matches terms against a finite-state machine. Implement WildcardQuery
+ and FuzzyQuery with finite-state methods. Adds RegexpQuery.
+ (Robert Muir, Mike McCandless, Uwe Schindler, Mark Miller)
+
+* LUCENE-1990: Adds internal packed ints implementation, to be used
+ for more efficient storage of int arrays when the values are
+ bounded, for example for storing the terms dict index Toke Toke
+ Eskildsen via Mike McCandless)
+
+* LUCENE-2321: Cutover to a more RAM efficient packed-ints based
+ representation for the in-memory terms dict index. (Mike
+ McCandless)
+
+* LUCENE-2126: Add new classes for data (de)serialization: DataInput
+ and DataOutput. IndexInput and IndexOutput extend these new classes.
+ (Michael Busch)
+
+* LUCENE-1458, LUCENE-2111: With flexible indexing it is now possible
+ for an application to create its own postings codec, to alter how
+ fields, terms, docs and positions are encoded into the index. The
+ standard codec is the default codec. Both IndexWriter and
+ IndexReader accept a CodecProvider class to obtain codecs for newly
+ written segments as well as existing segments opened for reading.
+
+* LUCENE-1458, LUCENE-2111: Some experimental codecs have been added
+ for flexible indexing, including pulsing codec (inlines
+ low-frequency terms directly into the terms dict, avoiding seeking
+ for some queries), sep codec (stores docs, freqs, positions, skip
+ data and payloads in 5 separate files instead of the 2 used by
+ standard codec), and int block (really a "base" for using
+ block-based compressors like PForDelta for storing postings data).
+
+* LUCENE-2302, LUCENE-1458, LUCENE-2111: Terms are no longer required
+ to be character based. Lucene views a term as an arbitrary byte[]:
+ during analysis, character-based terms are converted to UTF8 byte[],
+ but analyzers are free to directly create terms as byte[]
+ (NumericField does this, for example). The term data is buffered as
+ byte[] during indexing, written as byte[] into the terms dictionary,
+ and iterated as byte[] (wrapped in a BytesRef) by IndexReader for
+ searching.
+
+* LUCENE-2385: Moved NoDeletionPolicy from benchmark to core. NoDeletionPolicy
+ can be used to prevent commits from ever getting deleted from the index.
+ (Shai Erera)
+
+* LUCENE-1458, LUCENE-2111: The in-memory terms index used by standard
+ codec is more RAM efficient: terms data is stored as block byte
+ arrays and packed integers. Net RAM reduction for indexes that have
+ many unique terms should be substantial, and initial open time for
+ IndexReaders should be faster. These gains only apply for newly
+ written segments after upgrading.
+
+* LUCENE-1458, LUCENE-2111: Terms data are now buffered directly as
+ byte[] during indexing, which uses half the RAM for ascii terms (and
+ also numeric fields). This can improve indexing throughput for
+ applications that have many unique terms, since it reduces how often
+ a new segment must be flushed given a fixed RAM buffer size.
+
+* LUCENE-2398: Improve tests to work better from IDEs such as Eclipse.
+ (Paolo Castagna via Robert Muir)
+
+======================= Lucene 3.x (not yet released) =======================
+
+Changes in backwards compatibility policy
+
* LUCENE-1483: Removed utility class oal.util.SorterTemplate; this
class is no longer used by Lucene. (Gunnar Wagenknecht via Mike
McCandless)
@@ -117,22 +229,6 @@ Changes in backwards compatibility polic
of incrementToken(), tokenStream(), and reusableTokenStream().
(Uwe Schindler, Robert Muir)
-* LUCENE-2386: IndexWriter no longer performs an empty commit upon new index
- creation. Previously, if you passed an empty Directory and set OpenMode to
- CREATE*, IndexWriter would make a first empty commit. If you need that
- behavior you can call writer.commit()/close() immediately after you create it.
- (Shai Erera, Mike McCandless)
-
-* LUCENE-2316: Directory.fileLength contract was clarified - it returns the
- actual file's length if the file exists, and throws FileNotFoundException
- otherwise. Returning length=0 for a non-existent file is no longer allowed. If
- you relied on that, make sure to catch the exception. (Shai Erera)
-
-* LUCENE-2265: FuzzyQuery and WildcardQuery now operate on Unicode codepoints,
- not unicode code units. For example, a Wildcard "?" represents any unicode
- character. Furthermore, the rest of the automaton package and RegexpQuery use
- true Unicode codepoint representation. (Robert Muir, Mike McCandless)
-
Changes in runtime behavior
* LUCENE-1923: Made IndexReader.toString() produce something
@@ -141,10 +237,6 @@ Changes in runtime behavior
* LUCENE-2179: CharArraySet.clear() is now functional.
(Robert Muir, Uwe Schindler)
-* LUCENE-2421: NativeFSLockFactory does not throw LockReleaseFailedException if
- it cannot delete the lock file, since obtaining the lock does not fail if the
- file is there. (Shai Erera)
-
API Changes
* LUCENE-2076: Rename FSDirectory.getFile -> getDirectory. (George
@@ -215,20 +307,6 @@ API Changes
FSDirectory to see a sample of how such tracking might look like, if needed
in your custom Directories. (Earwin Burrfoot via Mike McCandless)
-* LUCENE-1458, LUCENE-2111: The postings APIs (TermEnum, TermDocsEnum,
- TermPositionsEnum) have been deprecated in favor of the new flexible
- indexing (flex) APIs (Fields, FieldsEnum, Terms, TermsEnum,
- DocsEnum, DocsAndPositionsEnum). One big difference is that field
- and terms are now enumerated separately: a TermsEnum provides a
- BytesRef (wraps a byte[]) per term within a single field, not a
- Term. Another is that when asking for a Docs/AndPositionsEnum, you
- now specify the skipDocs explicitly (typically this will be the
- deleted docs, but in general you can provide any Bits).
-
-* LUCENE-1458, LUCENE-2111: IndexReader now directly exposes its
- deleted docs (getDeletedDocs), providing a new Bits interface to
- directly query by doc ID.
-
* LUCENE-2302: Deprecated TermAttribute and replaced by a new
CharTermAttribute. The change is backwards compatible, so
mixed new/old TokenStreams all work on the same char[] buffer
@@ -240,12 +318,6 @@ API Changes
expressions).
(Uwe Schindler, Robert Muir)
-* LUCENE-2402: IndexWriter.deleteUnusedFiles now deletes unreferenced commit
- points too. If you use an IndexDeletionPolicy which holds onto index commits
- (such as SnapshotDeletionPolicy), you can call this method to remove those
- commit points when they are not needed anymore (instead of waiting for the
- next commit). (Shai Erera)
-
Bug fixes
* LUCENE-2119: Don't throw NegativeArraySizeException if you pass
@@ -290,28 +362,9 @@ Bug fixes
is fixed to return null if there are no segments. (Karthick
Sankarachary via Mike McCandless)
-* LUCENE-2222: FixedIntBlockIndexInput incorrectly read one block of
- 0s before the actual data. (Renaud Delbru via Mike McCandless)
-
-* LUCENE-2344: PostingsConsumer.merge was failing to call finishDoc,
- which caused corruption for sep codec. Also fixed several tests to
- test all 4 core codecs. (Renaud Delbru via Mike McCandless)
-
* LUCENE-2074: Reduce buffer size of lexer back to default on reset.
(Ruben Laguna, Shai Erera via Uwe Schindler)
-* LUCENE-2387: Don't hang onto Fieldables from the last doc indexed,
- in IndexWriter, nor the Reader in Tokenizer after close is
- called. (Ruben Laguna, Uwe Schindler, Mike McCandless)
-
-* LUCENE-2417: IndexCommit did not implement hashCode() and equals()
- consitently. Now they both take Directory and version into consideration. In
- addition, all of IndexComnmit methods which threw
- UnsupportedOperationException are now abstract. (Shai Erera)
-
-* LUCENE-2424: Fix FieldDoc.toString to actually return its fields
- (Stephen Green via Mike McCandless)
-
New features
* LUCENE-2128: Parallelized fetching document frequencies during weight
@@ -376,52 +429,6 @@ New features
files between FSDirectory instances. (Earwin Burrfoot via Mike
McCandless).
-* LUCENE-1606, LUCENE-2089: Adds AutomatonQuery, a MultiTermQuery that
- matches terms against a finite-state machine. Implement WildcardQuery
- and FuzzyQuery with finite-state methods. Adds RegexpQuery.
- (Robert Muir, Mike McCandless, Uwe Schindler, Mark Miller)
-
-* LUCENE-1990: Adds internal packed ints implementation, to be used
- for more efficient storage of int arrays when the values are
- bounded, for example for storing the terms dict index Toke Toke
- Eskildsen via Mike McCandless)
-
-* LUCENE-2321: Cutover to a more RAM efficient packed-ints based
- representation for the in-memory terms dict index. (Mike
- McCandless)
-
-* LUCENE-2126: Add new classes for data (de)serialization: DataInput
- and DataOutput. IndexInput and IndexOutput extend these new classes.
- (Michael Busch)
-
-* LUCENE-1458, LUCENE-2111: With flexible indexing it is now possible
- for an application to create its own postings codec, to alter how
- fields, terms, docs and positions are encoded into the index. The
- standard codec is the default codec. Both IndexWriter and
- IndexReader accept a CodecProvider class to obtain codecs for newly
- written segments as well as existing segments opened for reading.
-
-* LUCENE-1458, LUCENE-2111: Some experimental codecs have been added
- for flexible indexing, including pulsing codec (inlines
- low-frequency terms directly into the terms dict, avoiding seeking
- for some queries), sep codec (stores docs, freqs, positions, skip
- data and payloads in 5 separate files instead of the 2 used by
- standard codec), and int block (really a "base" for using
- block-based compressors like PForDelta for storing postings data).
-
-* LUCENE-2302, LUCENE-1458, LUCENE-2111: Terms are no longer required
- to be character based. Lucene views a term as an arbitrary byte[]:
- during analysis, character-based terms are converted to UTF8 byte[],
- but analyzers are free to directly create terms as byte[]
- (NumericField does this, for example). The term data is buffered as
- byte[] during indexing, written as byte[] into the terms dictionary,
- and iterated as byte[] (wrapped in a BytesRef) by IndexReader for
- searching.
-
-* LUCENE-2385: Moved NoDeletionPolicy from benchmark to core. NoDeletionPolicy
- can be used to prevent commits from ever getting deleted from the index.
- (Shai Erera)
-
* LUCENE-2074: Make StandardTokenizer fit for Unicode 4.0, if the
matchVersion parameter is Version.LUCENE_31. (Uwe Schindler)
@@ -492,24 +499,12 @@ Optimizations
because then it will make sense to make the RAM buffers as large as
possible. (Mike McCandless, Michael Busch)
-* LUCENE-1458, LUCENE-2111: The in-memory terms index used by standard
- codec is more RAM efficient: terms data is stored as block byte
- arrays and packed integers. Net RAM reduction for indexes that have
- many unique terms should be substantial, and initial open time for
- IndexReaders should be faster. These gains only apply for newly
- written segments after upgrading.
-
-* LUCENE-1458, LUCENE-2111: Terms data are now buffered directly as
- byte[] during indexing, which uses half the RAM for ascii terms (and
- also numeric fields). This can improve indexing throughput for
- applications that have many unique terms, since it reduces how often
- a new segment must be flushed given a fixed RAM buffer size.
Build
* LUCENE-2124: Moved the JDK-based collation support from contrib/collation
- into core, and moved the ICU-based collation support into contrib/icu.
- (Robert Muir)
+ into core, and moved the ICU-based collation support into contrib/icu.
+ (Robert Muir)
* LUCENE-2326: Removed SVN checkouts for backwards tests. The backwards
branch is now included in the svn repository using "svn copy"
@@ -518,12 +513,6 @@ Build
* LUCENE-2074: Regenerating StandardTokenizerImpl files now needs
JFlex 1.5 (currently only available on SVN). (Uwe Schindler)
-* LUCENE-1709: Tests are now parallelized by default (except for benchmark). You
- can force them to run sequentially by passing -Drunsequential=1 on the command
- line. The number of threads that are spwaned per CPU defaults to '1'. If you
- wish to change that, you can run the tests with -DthreadsPerProcessor=[num].
- (Robert Muir, Shai Erera, Peter Kofler)
-
Test Cases
* LUCENE-2037 Allow Junit4 tests in our envrionment (Erick Erickson
@@ -558,9 +547,6 @@ Test Cases
access to "real" files from the test folder itsself, can use
LuceneTestCase(J4).getDataFile(). (Uwe Schindler)
-* LUCENE-2398: Improve tests to work better from IDEs such as Eclipse.
- (Paolo Castagna via Robert Muir)
-
================== Release 2.9.2 / 3.0.1 2010-02-26 ====================
Changes in backwards compatibility policy