You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Robert Muir <rc...@gmail.com> on 2012/04/22 23:45:41 UTC

Re: svn commit: r1328978 - in /lucene/dev/trunk/lucene: JRE_VERSION_MIGRATION.txt MIGRATE.txt README.txt build.xml common-build.xml site/xsl/index.xsl

This looks really nice in the index.html, thanks for cleaning up these
files too Uwe!

On Sun, Apr 22, 2012 at 5:15 PM,  <us...@apache.org> wrote:
> Author: uschindler
> Date: Sun Apr 22 21:15:27 2012
> New Revision: 1328978
>
> URL: http://svn.apache.org/viewvc?rev=1328978&view=rev
> Log:
> LUCENE-4008: Use pegdown to transform MIGRATE.txt and other text-only files to readable HTML. Please alsows run ant documentation when you have changed anything on those files to check output.
>
> Modified:
>    lucene/dev/trunk/lucene/JRE_VERSION_MIGRATION.txt
>    lucene/dev/trunk/lucene/MIGRATE.txt
>    lucene/dev/trunk/lucene/README.txt
>    lucene/dev/trunk/lucene/build.xml
>    lucene/dev/trunk/lucene/common-build.xml
>    lucene/dev/trunk/lucene/site/xsl/index.xsl
>
> Modified: lucene/dev/trunk/lucene/JRE_VERSION_MIGRATION.txt
> URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/JRE_VERSION_MIGRATION.txt?rev=1328978&r1=1328977&r2=1328978&view=diff
> ==============================================================================
> --- lucene/dev/trunk/lucene/JRE_VERSION_MIGRATION.txt (original)
> +++ lucene/dev/trunk/lucene/JRE_VERSION_MIGRATION.txt Sun Apr 22 21:15:27 2012
> @@ -1,36 +1,37 @@
> +# JRE Version Migration Guide
> +
>  If possible, use the same JRE major version at both index and search time.
>  When upgrading to a different JRE major version, consider re-indexing.
>
>  Different JRE major versions may implement different versions of Unicode,
>  which will change the way some parts of Lucene treat your text.
>
> -For example: with Java 1.4, LetterTokenizer will split around the character U+02C6,
> +For example: with Java 1.4, `LetterTokenizer` will split around the character U+02C6,
>  but with Java 5 it will not.
>  This is because Java 1.4 implements Unicode 3, but Java 5 implements Unicode 4.
>
>  For reference, JRE major versions with their corresponding Unicode versions:
> -Java 1.4, Unicode 3.0
> -Java 5, Unicode 4.0
> -Java 6, Unicode 4.0
> -Java 7, Unicode 6.0
> +
> + * Java 1.4, Unicode 3.0
> + * Java 5, Unicode 4.0
> + * Java 6, Unicode 4.0
> + * Java 7, Unicode 6.0
>
>  In general, whether or not you need to re-index largely depends upon the data that
>  you are searching, and what was changed in any given Unicode version. For example,
>  if you are completely sure that your content is limited to the "Basic Latin" range
>  of Unicode, you can safely ignore this.
>
> -Special Notes:
> -
> -LUCENE 2.9 TO 3.0, JAVA 1.4 TO JAVA 5 TRANSITION
> +## Special Notes: LUCENE 2.9 TO 3.0, JAVA 1.4 TO JAVA 5 TRANSITION
>
> -* StandardAnalyzer will return the same results under Java 5 as it did under
> +* `StandardAnalyzer` will return the same results under Java 5 as it did under
>  Java 1.4. This is because it is largely independent of the runtime JRE for
>  Unicode support, (with the exception of lowercasing).  However, no changes to
>  casing have occurred in Unicode 4.0 that affect StandardAnalyzer, so if you are
>  using this Analyzer you are NOT affected.
>
> -* SimpleAnalyzer, StopAnalyzer, LetterTokenizer, LowerCaseFilter, and
> -LowerCaseTokenizer may return different results, along with many other Analyzers
> -and TokenStreams in Lucene's analysis modules. If you are using one of these
> +* `SimpleAnalyzer`, `StopAnalyzer`, `LetterTokenizer`, `LowerCaseFilter`, and
> +`LowerCaseTokenizer` may return different results, along with many other `Analyzer`s
> +and `TokenStream`s in Lucene's analysis modules. If you are using one of these
>  components, you may be affected.
>
>
> Modified: lucene/dev/trunk/lucene/MIGRATE.txt
> URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/MIGRATE.txt?rev=1328978&r1=1328977&r2=1328978&view=diff
> ==============================================================================
> --- lucene/dev/trunk/lucene/MIGRATE.txt (original)
> +++ lucene/dev/trunk/lucene/MIGRATE.txt Sun Apr 22 21:15:27 2012
> @@ -1,138 +1,63 @@
> +# Apache Lucene Migration Guide
>
> -LUCENE-2380: FieldCache.getStrings/Index --> FieldCache.getDocTerms/Index
> +## Four-dimensional enumerations
>
> -  * The field values returned when sorting by SortField.STRING are now
> -    BytesRef.  You can call value.utf8ToString() to convert back to
> -    string, if necessary.
> -
> -  * In FieldCache, getStrings (returning String[]) has been replaced
> -    with getTerms (returning a FieldCache.DocTerms instance).
> -    DocTerms provides a getTerm method, taking a docID and a BytesRef
> -    to fill (which must not be null), and it fills it in with the
> -    reference to the bytes for that term.
> -
> -    If you had code like this before:
> -
> -      String[] values = FieldCache.DEFAULT.getStrings(reader, field);
> -      ...
> -      String aValue = values[docID];
> -
> -    you can do this instead:
> -
> -      DocTerms values = FieldCache.DEFAULT.getTerms(reader, field);
> -      ...
> -      BytesRef term = new BytesRef();
> -      String aValue = values.getTerm(docID, term).utf8ToString();
> -
> -    Note however that it can be costly to convert to String, so it's
> -    better to work directly with the BytesRef.
> -
> -  * Similarly, in FieldCache, getStringIndex (returning a StringIndex
> -    instance, with direct arrays int[] order and String[] lookup) has
> -    been replaced with getTermsIndex (returning a
> -    FieldCache.DocTermsIndex instance).  DocTermsIndex provides the
> -    getOrd(int docID) method to lookup the int order for a document,
> -    lookup(int ord, BytesRef reuse) to lookup the term from a given
> -    order, and the sugar method getTerm(int docID, BytesRef reuse)
> -    which internally calls getOrd and then lookup.
> -
> -    If you had code like this before:
> -
> -      StringIndex idx = FieldCache.DEFAULT.getStringIndex(reader, field);
> -      ...
> -      int ord = idx.order[docID];
> -      String aValue = idx.lookup[ord];
> -
> -    you can do this instead:
> -
> -      DocTermsIndex idx = FieldCache.DEFAULT.getTermsIndex(reader, field);
> -      ...
> -      int ord = idx.getOrd(docID);
> -      BytesRef term = new BytesRef();
> -      String aValue = idx.lookup(ord, term).utf8ToString();
> -
> -    Note however that it can be costly to convert to String, so it's
> -    better to work directly with the BytesRef.
> -
> -    DocTermsIndex also has a getTermsEnum() method, which returns an
> -    iterator (TermsEnum) over the term values in the index (ie,
> -    iterates ord = 0..numOrd()-1).
> -
> -  * StringComparatorLocale is now more CPU costly than it was before
> -    (it was already very CPU costly since it does not compare using
> -    indexed collation keys; use CollationKeyFilter for better
> -    performance), since it converts BytesRef -> String on the fly.
> -    Also, the field values returned when sorting by SortField.STRING
> -    are now BytesRef.
> -
> -  * FieldComparator.StringOrdValComparator has been renamed to
> -    TermOrdValComparator, and now uses BytesRef for its values.
> -    Likewise for StringValComparator, renamed to TermValComparator.
> -    This means when sorting by SortField.STRING or
> -    SortField.STRING_VAL (or directly invoking these comparators) the
> -    values returned in the FieldDoc.fields array will be BytesRef not
> -    String.  You can call the .utf8ToString() method on the BytesRef
> -    instances, if necessary.
> +Flexible indexing changed the low level fields/terms/docs/positions
> +enumeration APIs.  Here are the major changes:
>
> +  * Terms are now binary in nature (arbitrary byte[]), represented
> +    by the BytesRef class (which provides an offset + length "slice"
> +    into an existing byte[]).
>
> -
> -LUCENE-1458, LUCENE-2111: Flexible Indexing
> -
> -  Flexible indexing changed the low level fields/terms/docs/positions
> -  enumeration APIs.  Here are the major changes:
> -
> -    * Terms are now binary in nature (arbitrary byte[]), represented
> -      by the BytesRef class (which provides an offset + length "slice"
> -      into an existing byte[]).
> -
> -    * Fields are separately enumerated (FieldsEnum) from the terms
> -      within each field (TermEnum).  So instead of this:
> +  * Fields are separately enumerated (FieldsEnum) from the terms
> +    within each field (TermEnum).  So instead of this:
>
>         TermEnum termsEnum = ...;
> -       while(termsEnum.next()) {
> -         Term t = termsEnum.term();
> -         System.out.println("field=" + t.field() + "; text=" + t.text());
> +        while(termsEnum.next()) {
> +          Term t = termsEnum.term();
> +          System.out.println("field=" + t.field() + "; text=" + t.text());
>         }
>
> -      Do this:
> -
> +    Do this:
> +
>         FieldsEnum fieldsEnum = ...;
> -       String field;
> -       while((field = fieldsEnum.next()) != null) {
> -         TermsEnum termsEnum = fieldsEnum.terms();
> -         BytesRef text;
> -         while((text = termsEnum.next()) != null) {
> -           System.out.println("field=" + field + "; text=" + text.utf8ToString());
> -         }
> +        String field;
> +        while((field = fieldsEnum.next()) != null) {
> +            TermsEnum termsEnum = fieldsEnum.terms();
> +            BytesRef text;
> +            while((text = termsEnum.next()) != null) {
> +              System.out.println("field=" + field + "; text=" + text.utf8ToString());
> +          }
> +        }
>
> -    * TermDocs is renamed to DocsEnum.  Instead of this:
> +  * TermDocs is renamed to DocsEnum.  Instead of this:
>
>         while(td.next()) {
> -         int doc = td.doc();
> -         ...
> -       }
> +          int doc = td.doc();
> +          ...
> +        }
>
> -      do this:
> +    do this:
>
>         int doc;
> -       while((doc = td.next()) != DocsEnum.NO_MORE_DOCS) {
> -         ...
> -       }
> +        while((doc = td.next()) != DocsEnum.NO_MORE_DOCS) {
> +          ...
> +        }
>
> -      Instead of this:
> -
> +    Instead of this:
> +
>         if (td.skipTo(target)) {
> -         int doc = td.doc();
> -         ...
> -       }
> +          int doc = td.doc();
> +          ...
> +        }
>
> -      do this:
> -
> +    do this:
> +
>         if ((doc=td.advance(target)) != DocsEnum.NO_MORE_DOCS) {
> -         ...
> -       }
> +          ...
> +        }
>
> -      The bulk read API has also changed.  Instead of this:
> +    The bulk read API has also changed.  Instead of this:
>
>         int[] docs = new int[256];
>         int[] freqs = new int[256];
> @@ -145,7 +70,7 @@ LUCENE-1458, LUCENE-2111: Flexible Index
>           // use docs[i], freqs[i]
>         }
>
> -      do this:
> +    do this:
>
>         DocsEnum.BulkReadResult bulk = td.getBulkResult();
>         while(true) {
> @@ -156,319 +81,358 @@ LUCENE-1458, LUCENE-2111: Flexible Index
>           // use bulk.docs.ints[i] and bulk.freqs.ints[i]
>         }
>
> -    * TermPositions is renamed to DocsAndPositionsEnum, and no longer
> -      extends the docs only enumerator (DocsEnum).
> +  * TermPositions is renamed to DocsAndPositionsEnum, and no longer
> +    extends the docs only enumerator (DocsEnum).
>
> -    * Deleted docs are no longer implicitly filtered from
> -      docs/positions enums.  Instead, you pass a Bits
> -      skipDocs (set bits are skipped) when obtaining the enums.  Also,
> -      you can now ask a reader for its deleted docs.
> -
> -    * The docs/positions enums cannot seek to a term.  Instead,
> -      TermsEnum is able to seek, and then you request the
> -      docs/positions enum from that TermsEnum.
> +  * Deleted docs are no longer implicitly filtered from
> +    docs/positions enums.  Instead, you pass a Bits
> +    skipDocs (set bits are skipped) when obtaining the enums.  Also,
> +    you can now ask a reader for its deleted docs.
> +
> +  * The docs/positions enums cannot seek to a term.  Instead,
> +    TermsEnum is able to seek, and then you request the
> +    docs/positions enum from that TermsEnum.
>
> -    * TermsEnum's seek method returns more information.  So instead of
> -      this:
> +  * TermsEnum's seek method returns more information.  So instead of
> +    this:
>
>         Term t;
>         TermEnum termEnum = reader.terms(t);
> -       if (t.equals(termEnum.term())) {
> -         ...
> +        if (t.equals(termEnum.term())) {
> +          ...
>         }
>
> -      do this:
> +    do this:
>
>         TermsEnum termsEnum = ...;
> -       BytesRef text;
> -       if (termsEnum.seek(text) == TermsEnum.SeekStatus.FOUND) {
> -         ...
> -       }
> -
> -      SeekStatus also contains END (enumerator is done) and NOT_FOUND
> -      (term was not found but enumerator is now positioned to the next
> -      term).
> -
> -    * TermsEnum has an ord() method, returning the long numeric
> -      ordinal (ie, first term is 0, next is 1, and so on) for the term
> -      it's not positioned to.  There is also a corresponding seek(long
> -      ord) method.  Note that these methods are optional; in
> -      particular the MultiFields TermsEnum does not implement them.
> -
> -
> -  How you obtain the enums has changed.  The primary entry point is
> -  the Fields class.  If you know your reader is a single segment
> -  reader, do this:
> -
> -    Fields fields = reader.Fields();
> -    if (fields != null) {
> -      ...
> -    }
> +        BytesRef text;
> +        if (termsEnum.seek(text) == TermsEnum.SeekStatus.FOUND) {
> +          ...
> +        }
>
> -  If the reader might be multi-segment, you must do this:
> +    SeekStatus also contains END (enumerator is done) and NOT_FOUND
> +    (term was not found but enumerator is now positioned to the next
> +    term).
> +
> +  * TermsEnum has an ord() method, returning the long numeric
> +    ordinal (ie, first term is 0, next is 1, and so on) for the term
> +    it's not positioned to.  There is also a corresponding seek(long
> +    ord) method.  Note that these methods are optional; in
> +    particular the MultiFields TermsEnum does not implement them.
> +
> +
> +  * How you obtain the enums has changed.  The primary entry point is
> +    the Fields class.  If you know your reader is a single segment
> +    reader, do this:
> +
> +        Fields fields = reader.Fields();
> +        if (fields != null) {
> +          ...
> +        }
> +
> +    If the reader might be multi-segment, you must do this:
>
> -    Fields fields = MultiFields.getFields(reader);
> -    if (fields != null) {
> -      ...
> -    }
> +        Fields fields = MultiFields.getFields(reader);
> +        if (fields != null) {
> +          ...
> +        }
>
> -  The fields may be null (eg if the reader has no fields).
> +    The fields may be null (eg if the reader has no fields).
>
> -  Note that the MultiFields approach entails a performance hit on
> -  MultiReaders, as it must merge terms/docs/positions on the fly. It's
> -  generally better to instead get the sequential readers (use
> -  oal.util.ReaderUtil) and then step through those readers yourself,
> -  if you can (this is how Lucene drives searches).
> +    Note that the MultiFields approach entails a performance hit on
> +    MultiReaders, as it must merge terms/docs/positions on the fly. It's
> +    generally better to instead get the sequential readers (use
> +    oal.util.ReaderUtil) and then step through those readers yourself,
> +    if you can (this is how Lucene drives searches).
> +
> +    If you pass a SegmentReader to MultiFields.fiels it will simply
> +    return reader.fields(), so there is no performance hit in that
> +    case.
> +
> +    Once you have a non-null Fields you can do this:
> +
> +        Terms terms = fields.terms("field");
> +        if (terms != null) {
> +          ...
> +        }
>
> -  If you pass a SegmentReader to MultiFields.fiels it will simply
> -  return reader.fields(), so there is no performance hit in that
> -  case.
> +    The terms may be null (eg if the field does not exist).
>
> -  Once you have a non-null Fields you can do this:
> +    Once you have a non-null terms you can get an enum like this:
>
> -    Terms terms = fields.terms("field");
> -    if (terms != null) {
> -      ...
> -    }
> +        TermsEnum termsEnum = terms.iterator();
>
> -  The terms may be null (eg if the field does not exist).
> +    The returned TermsEnum will not be null.
>
> -  Once you have a non-null terms you can get an enum like this:
> +    You can then .next() through the TermsEnum, or seek.  If you want a
> +    DocsEnum, do this:
>
> -    TermsEnum termsEnum = terms.iterator();
> +        Bits liveDocs = reader.getLiveDocs();
> +        DocsEnum docsEnum = null;
>
> -  The returned TermsEnum will not be null.
> +        docsEnum = termsEnum.docs(liveDocs, docsEnum);
>
> -  You can then .next() through the TermsEnum, or seek.  If you want a
> -  DocsEnum, do this:
> +    You can pass in a prior DocsEnum and it will be reused if possible.
>
> -    Bits liveDocs = reader.getLiveDocs();
> -    DocsEnum docsEnum = null;
> +    Likewise for DocsAndPositionsEnum.
>
> -    docsEnum = termsEnum.docs(liveDocs, docsEnum);
> +    IndexReader has several sugar methods (which just go through the
> +    above steps, under the hood).  Instead of:
>
> -  You can pass in a prior DocsEnum and it will be reused if possible.
> +        Term t;
> +        TermDocs termDocs = reader.termDocs();
> +        termDocs.seek(t);
>
> -  Likewise for DocsAndPositionsEnum.
> +    do this:
>
> -  IndexReader has several sugar methods (which just go through the
> -  above steps, under the hood).  Instead of:
> +        String field;
> +        BytesRef text;
> +        DocsEnum docsEnum = reader.termDocsEnum(reader.getLiveDocs(), field, text);
>
> -    Term t;
> -    TermDocs termDocs = reader.termDocs();
> -    termDocs.seek(t);
> +    Likewise for DocsAndPositionsEnum.
>
> -  do this:
> +## LUCENE-2380: FieldCache.getStrings/Index --> FieldCache.getDocTerms/Index
>
> -    String field;
> -    BytesRef text;
> -    DocsEnum docsEnum = reader.termDocsEnum(reader.getLiveDocs(), field, text);
> +  * The field values returned when sorting by SortField.STRING are now
> +    BytesRef.  You can call value.utf8ToString() to convert back to
> +    string, if necessary.
>
> -  Likewise for DocsAndPositionsEnum.
> +  * In FieldCache, getStrings (returning String[]) has been replaced
> +    with getTerms (returning a FieldCache.DocTerms instance).
> +    DocTerms provides a getTerm method, taking a docID and a BytesRef
> +    to fill (which must not be null), and it fills it in with the
> +    reference to the bytes for that term.
>
> -* LUCENE-2600: remove IndexReader.isDeleted
> +    If you had code like this before:
>
> -  Instead of IndexReader.isDeleted, do this:
> +        String[] values = FieldCache.DEFAULT.getStrings(reader, field);
> +        ...
> +        String aValue = values[docID];
>
> -    import org.apache.lucene.util.Bits;
> -    import org.apache.lucene.index.MultiFields;
> +    you can do this instead:
>
> -    Bits liveDocs = MultiFields.getLiveDocs(indexReader);
> -    if (!liveDocs.get(docID)) {
> -      // document is deleted...
> -    }
> -
> -* LUCENE-2858, LUCENE-3733: The abstract class IndexReader has been
> -  refactored to expose only essential methods to access stored fields
> -  during display of search results. It is no longer possible to retrieve
> -  terms or postings data from the underlying index, not even deletions are
> -  visible anymore. You can still pass IndexReader as constructor parameter
> -  to IndexSearcher and execute your searches; Lucene will automatically
> -  delegate procedures like query rewriting and document collection atomic
> -  subreaders.
> -
> -  If you want to dive deeper into the index and want to write own queries,
> -  take a closer look at the new abstract sub-classes AtomicReader and
> -  CompositeReader:
> -
> -  AtomicReader instances are now the only source of Terms, Postings,
> -  DocValues and FieldCache. Queries are forced to execute on a Atomic
> -  reader on a per-segment basis and FieldCaches are keyed by
> -  AtomicReaders.
> -
> -  Its counterpart CompositeReader exposes a utility method to retrieve
> -  its composites. But watch out, composites are not necessarily atomic.
> -  Next to the added type-safety we also removed the notion of
> -  index-commits and version numbers from the abstract IndexReader, the
> -  associations with IndexWriter were pulled into a specialized
> -  DirectoryReader. To open Directory-based indexes use
> -  DirectoryReader.open(), the corresponding method in IndexReader is now
> -  deprecated for easier migration. Only DirectoryReader supports commits,
> -  versions, and reopening with openIfChanged(). Terms, postings,
> -  docvalues, and norms can from now on only be retrieved using
> -  AtomicReader; DirectoryReader and MultiReader extend CompositeReader,
> -  only offering stored fields and access to the sub-readers (which may be
> -  composite or atomic).
> -
> -  If you have more advanced code dealing with custom Filters, you might
> -  have noticed another new class hierarchy in Lucene (see LUCENE-2831):
> -  IndexReaderContext with corresponding Atomic-/CompositeReaderContext.
> -
> -  The move towards per-segment search Lucene 2.9 exposed lots of custom
> -  Queries and Filters that couldn't handle it. For example, some Filter
> -  implementations expected the IndexReader passed in is identical to the
> -  IndexReader passed to IndexSearcher with all its advantages like
> -  absolute document IDs etc. Obviously this "paradigm-shift" broke lots of
> -  applications and especially those that utilized cross-segment data
> -  structures (like Apache Solr).
> -
> -  In Lucene 4.0, we introduce IndexReaderContexts "searcher-private"
> -  reader hierarchy. During Query or Filter execution Lucene no longer
> -  passes raw readers down Queries, Filters or Collectors; instead
> -  components are provided an AtomicReaderContext (essentially a hierarchy
> -  leaf) holding relative properties like the document-basis in relation to
> -  the top-level reader. This allows Queries & Filter to build up logic
> -  based on document IDs, albeit the per-segment orientation.
> -
> -  There are still valid use-cases where top-level readers ie. "atomic
> -  views" on the index are desirable. Let say you want to iterate all terms
> -  of a complete index for auto-completion or facetting, Lucene provides
> -  utility wrappers like SlowCompositeReaderWrapper (LUCENE-2597) emulating
> -  an AtomicReader. Note: using "atomicity emulators" can cause serious
> -  slowdowns due to the need to merge terms, postings, DocValues, and
> -  FieldCache, use them with care!
> +        DocTerms values = FieldCache.DEFAULT.getTerms(reader, field);
> +        ...
> +        BytesRef term = new BytesRef();
> +        String aValue = values.getTerm(docID, term).utf8ToString();
>
> -* LUCENE-2674: A new idfExplain method was added to Similarity, that
> -  accepts an incoming docFreq.  If you subclass Similarity, make sure
> -  you also override this method on upgrade, otherwise your
> -  customizations won't run for certain MultiTermQuerys.
> +    Note however that it can be costly to convert to String, so it's
> +    better to work directly with the BytesRef.
>
> -* LUCENE-2413: Lucene's core and contrib analyzers, along with Solr's analyzers,
> -  were consolidated into lucene/analysis. During the refactoring some
> -  package names have changed:
> -    - o.a.l.analysis.KeywordAnalyzer -> o.a.l.analysis.core.KeywordAnalyzer
> -    - o.a.l.analysis.KeywordTokenizer -> o.a.l.analysis.core.KeywordTokenizer
> -    - o.a.l.analysis.LetterTokenizer -> o.a.l.analysis.core.LetterTokenizer
> -    - o.a.l.analysis.LowerCaseFilter -> o.a.l.analysis.core.LowerCaseFilter
> -    - o.a.l.analysis.LowerCaseTokenizer -> o.a.l.analysis.core.LowerCaseTokenizer
> -    - o.a.l.analysis.SimpleAnalyzer -> o.a.l.analysis.core.SimpleAnalyzer
> -    - o.a.l.analysis.StopAnalyzer -> o.a.l.analysis.core.StopAnalyzer
> -    - o.a.l.analysis.StopFilter -> o.a.l.analysis.core.StopFilter
> -    - o.a.l.analysis.WhitespaceAnalyzer -> o.a.l.analysis.core.WhitespaceAnalyzer
> -    - o.a.l.analysis.WhitespaceTokenizer -> o.a.l.analysis.core.WhitespaceTokenizer
> -    - o.a.l.analysis.PorterStemFilter -> o.a.l.analysis.en.PorterStemFilter
> -    - o.a.l.analysis.ASCIIFoldingFilter -> o.a.l.analysis.miscellaneous.ASCIIFoldingFilter
> -    - o.a.l.analysis.ISOLatin1AccentFilter -> o.a.l.analysis.miscellaneous.ISOLatin1AccentFilter
> -    - o.a.l.analysis.KeywordMarkerFilter -> o.a.l.analysis.miscellaneous.KeywordMarkerFilter
> -    - o.a.l.analysis.LengthFilter -> o.a.l.analysis.miscellaneous.LengthFilter
> -    - o.a.l.analysis.PerFieldAnalyzerWrapper -> o.a.l.analysis.miscellaneous.PerFieldAnalyzerWrapper
> -    - o.a.l.analysis.TeeSinkTokenFilter -> o.a.l.analysis.sinks.TeeSinkTokenFilter
> -    - o.a.l.analysis.CharFilter -> o.a.l.analysis.charfilter.CharFilter
> -    - o.a.l.analysis.BaseCharFilter -> o.a.l.analysis.charfilter.BaseCharFilter
> -    - o.a.l.analysis.MappingCharFilter -> o.a.l.analysis.charfilter.MappingCharFilter
> -    - o.a.l.analysis.NormalizeCharMap -> o.a.l.analysis.charfilter.NormalizeCharMap
> -    - o.a.l.analysis.CharArraySet -> o.a.l.analysis.util.CharArraySet
> -    - o.a.l.analysis.CharArrayMap -> o.a.l.analysis.util.CharArrayMap
> -    - o.a.l.analysis.ReusableAnalyzerBase -> o.a.l.analysis.util.ReusableAnalyzerBase
> -    - o.a.l.analysis.StopwordAnalyzerBase -> o.a.l.analysis.util.StopwordAnalyzerBase
> -    - o.a.l.analysis.WordListLoader -> o.a.l.analysis.util.WordListLoader
> -    - o.a.l.analysis.CharTokenizer -> o.a.l.analysis.util.CharTokenizer
> -    - o.a.l.util.CharacterUtils -> o.a.l.analysis.util.CharacterUtils
> +  * Similarly, in FieldCache, getStringIndex (returning a StringIndex
> +    instance, with direct arrays int[] order and String[] lookup) has
> +    been replaced with getTermsIndex (returning a
> +    FieldCache.DocTermsIndex instance).  DocTermsIndex provides the
> +    getOrd(int docID) method to lookup the int order for a document,
> +    lookup(int ord, BytesRef reuse) to lookup the term from a given
> +    order, and the sugar method getTerm(int docID, BytesRef reuse)
> +    which internally calls getOrd and then lookup.
>
> -* LUCENE-2514: The option to use a Collator's order (instead of binary order) for
> -  sorting and range queries has been moved to lucene/queries.
> +    If you had code like this before:
>
> -  The Collated TermRangeQuery/Filter has been moved to SlowCollatedTermRangeQuery/Filter,
> -  and the collated sorting has been moved to SlowCollatedStringComparator.
> +        StringIndex idx = FieldCache.DEFAULT.getStringIndex(reader, field);
> +        ...
> +        int ord = idx.order[docID];
> +        String aValue = idx.lookup[ord];
>
> -  Note: this functionality isn't very scalable and if you are using it, consider
> -  indexing collation keys with the collation support in the analysis module instead.
> -
> -  To perform collated range queries, use a suitable collating analyzer: CollationKeyAnalyzer
> -  or ICUCollationKeyAnalyzer, and set qp.setAnalyzeRangeTerms(true).
> -
> -  TermRangeQuery and TermRangeFilter now work purely on bytes. Both have helper factory methods
> -  (newStringRange) similar to the NumericRange API, to easily perform range queries on Strings.
> -
> -* LUCENE-2691: The near-real-time API has moved from IndexWriter to
> -  IndexReader.  Instead of IndexWriter.getReader(), call
> -  IndexReader.open(IndexWriter) or IndexReader.reopen(IndexWriter).
> +    you can do this instead:
>
> -* LUCENE-2690: MultiTermQuery boolean rewrites per segment.
> -  Also MultiTermQuery.getTermsEnum() now takes an AttributeSource. FuzzyTermsEnum
> -  is both consumer and producer of attributes: MTQ.BoostAttribute is
> -  added to the FuzzyTermsEnum and MTQ's rewrite mode consumes it.
> -  The other way round MTQ.TopTermsBooleanQueryRewrite supplys a
> -  global AttributeSource to each segments TermsEnum. The TermsEnum is consumer
> -  and gets the current minimum competitive boosts (MTQ.MaxNonCompetitiveBoostAttribute).
> +        DocTermsIndex idx = FieldCache.DEFAULT.getTermsIndex(reader, field);
> +        ...
> +        int ord = idx.getOrd(docID);
> +        BytesRef term = new BytesRef();
> +        String aValue = idx.lookup(ord, term).utf8ToString();
>
> -* LUCENE-2374: The backwards layer in AttributeImpl was removed. To support correct
> -  reflection of AttributeImpl instances, where the reflection was done using deprecated
> -  toString() parsing, you have to now override reflectWith() to customize output.
> -  toString() is no longer implemented by AttributeImpl, so if you have overridden
> -  toString(), port your customization over to reflectWith(). reflectAsString() would
> -  then return what toString() did before.
> +    Note however that it can be costly to convert to String, so it's
> +    better to work directly with the BytesRef.
>
> -* LUCENE-2236, LUCENE-2912: DefaultSimilarity can no longer be set statically
> -  (and dangerously) for the entire JVM.
> -  Similarity can now be configured on a per-field basis (via PerFieldSimilarityWrapper)
> -  Similarity has a lower-level API, if you want the higher-level vector-space API
> -  like in previous Lucene releases, then look at TFIDFSimilarity.
> +    DocTermsIndex also has a getTermsEnum() method, which returns an
> +    iterator (TermsEnum) over the term values in the index (ie,
> +    iterates ord = 0..numOrd()-1).
>
> -* LUCENE-1076: TieredMergePolicy is now the default merge policy.
> -  It's able to merge non-contiguous segments; this may cause problems
> -  for applications that rely on Lucene's internal document ID
> -  assigment.  If so, you should instead use LogByteSize/DocMergePolicy
> -  during indexing.
> +  * StringComparatorLocale is now more CPU costly than it was before
> +    (it was already very CPU costly since it does not compare using
> +    indexed collation keys; use CollationKeyFilter for better
> +    performance), since it converts BytesRef -> String on the fly.
> +    Also, the field values returned when sorting by SortField.STRING
> +    are now BytesRef.
>
> -* LUCENE-2883: Lucene's o.a.l.search.function ValueSource based functionality, was consolidated
> -  into lucene/queries along with Solr's similar functionality.  The following classes were moved:
> -   - o.a.l.search.function.CustomScoreQuery -> o.a.l.queries.CustomScoreQuery
> -   - o.a.l.search.function.CustomScoreProvider -> o.a.l.queries.CustomScoreProvider
> -   - o.a.l.search.function.NumericIndexDocValueSource -> o.a.l.queries.function.valuesource.NumericIndexDocValueSource
> -  The following lists the replacement classes for those removed:
> -   - o.a.l.search.function.ByteFieldSource -> o.a.l.queries.function.valuesource.ByteFieldSource
> -   - o.a.l.search.function.DocValues -> o.a.l.queries.function.DocValues
> -   - o.a.l.search.function.FieldCacheSource -> o.a.l.queries.function.valuesource.FieldCacheSource
> -   - o.a.l.search.function.FieldScoreQuery ->o.a.l.queries.function.FunctionQuery
> -   - o.a.l.search.function.FloatFieldSource -> o.a.l.queries.function.valuesource.FloatFieldSource
> -   - o.a.l.search.function.IntFieldSource -> o.a.l.queries.function.valuesource.IntFieldSource
> -   - o.a.l.search.function.OrdFieldSource -> o.a.l.queries.function.valuesource.OrdFieldSource
> -   - o.a.l.search.function.ReverseOrdFieldSource -> o.a.l.queries.function.valuesource.ReverseOrdFieldSource
> -   - o.a.l.search.function.ShortFieldSource -> o.a.l.queries.function.valuesource.ShortFieldSource
> -   - o.a.l.search.function.ValueSource -> o.a.l.queries.function.ValueSource
> -   - o.a.l.search.function.ValueSourceQuery -> o.a.l.queries.function.FunctionQuery
> -
> -   DocValues are now named FunctionValues, to not confuse with Lucene's per-document values.
> -
> -* LUCENE-2392: Enable flexible scoring:
> -
> -  The existing "Similarity" api is now TFIDFSimilarity, if you were extending
> -  Similarity before, you should likely extend this instead.
> -
> -  Weight.normalize no longer takes a norm value that incorporates the top-level
> -  boost from outer queries such as BooleanQuery, instead it takes 2 parameters,
> -  the outer boost (topLevelBoost) and the norm. Weight.sumOfSquaredWeights has
> -  been renamed to Weight.getValueForNormalization().
> +  * FieldComparator.StringOrdValComparator has been renamed to
> +    TermOrdValComparator, and now uses BytesRef for its values.
> +    Likewise for StringValComparator, renamed to TermValComparator.
> +    This means when sorting by SortField.STRING or
> +    SortField.STRING_VAL (or directly invoking these comparators) the
> +    values returned in the FieldDoc.fields array will be BytesRef not
> +    String.  You can call the .utf8ToString() method on the BytesRef
> +    instances, if necessary.
>
> -  The scorePayload method now takes a BytesRef. It is never null.
> +## LUCENE-2600: IndexReaders are now read-only
>
> -* LUCENE-3722: Similarity methods and collection/term statistics now take
> -  long instead of int (to enable distributed scoring of > 2B docs).
> -  For example, in TFIDFSimilarity idf(int, int) is now idf(long, long).
> +  Instead of IndexReader.isDeleted, do this:
>
> -* LUCENE-3559: The methods "docFreq" and "maxDoc" on IndexSearcher were removed,
> -  as these are no longer used by the scoring system.
> +      import org.apache.lucene.util.Bits;
> +      import org.apache.lucene.index.MultiFields;
>
> -  If you were using these casually in your code for reasons unrelated to scoring,
> -  call them on the IndexSearcher's reader instead: getIndexReader().
> +      Bits liveDocs = MultiFields.getLiveDocs(indexReader);
> +      if (!liveDocs.get(docID)) {
> +        // document is deleted...
> +      }
> +
> +## LUCENE-2858, LUCENE-3733: IndexReader --> AtomicReader/CompositeReader/DirectoryReader refactoring
>
> -  If you were subclassing IndexSearcher and overriding these methods to alter
> -  scoring, override IndexSearcher's termStatistics() and collectionStatistics()
> -  methods instead.
> +The abstract class IndexReader has been
> +refactored to expose only essential methods to access stored fields
> +during display of search results. It is no longer possible to retrieve
> +terms or postings data from the underlying index, not even deletions are
> +visible anymore. You can still pass IndexReader as constructor parameter
> +to IndexSearcher and execute your searches; Lucene will automatically
> +delegate procedures like query rewriting and document collection atomic
> +subreaders.
> +
> +If you want to dive deeper into the index and want to write own queries,
> +take a closer look at the new abstract sub-classes AtomicReader and
> +CompositeReader:
> +
> +AtomicReader instances are now the only source of Terms, Postings,
> +DocValues and FieldCache. Queries are forced to execute on a Atomic
> +reader on a per-segment basis and FieldCaches are keyed by
> +AtomicReaders.
> +
> +Its counterpart CompositeReader exposes a utility method to retrieve
> +its composites. But watch out, composites are not necessarily atomic.
> +Next to the added type-safety we also removed the notion of
> +index-commits and version numbers from the abstract IndexReader, the
> +associations with IndexWriter were pulled into a specialized
> +DirectoryReader. To open Directory-based indexes use
> +DirectoryReader.open(), the corresponding method in IndexReader is now
> +deprecated for easier migration. Only DirectoryReader supports commits,
> +versions, and reopening with openIfChanged(). Terms, postings,
> +docvalues, and norms can from now on only be retrieved using
> +AtomicReader; DirectoryReader and MultiReader extend CompositeReader,
> +only offering stored fields and access to the sub-readers (which may be
> +composite or atomic).
> +
> +If you have more advanced code dealing with custom Filters, you might
> +have noticed another new class hierarchy in Lucene (see LUCENE-2831):
> +IndexReaderContext with corresponding Atomic-/CompositeReaderContext.
> +
> +The move towards per-segment search Lucene 2.9 exposed lots of custom
> +Queries and Filters that couldn't handle it. For example, some Filter
> +implementations expected the IndexReader passed in is identical to the
> +IndexReader passed to IndexSearcher with all its advantages like
> +absolute document IDs etc. Obviously this "paradigm-shift" broke lots of
> +applications and especially those that utilized cross-segment data
> +structures (like Apache Solr).
> +
> +In Lucene 4.0, we introduce IndexReaderContexts "searcher-private"
> +reader hierarchy. During Query or Filter execution Lucene no longer
> +passes raw readers down Queries, Filters or Collectors; instead
> +components are provided an AtomicReaderContext (essentially a hierarchy
> +leaf) holding relative properties like the document-basis in relation to
> +the top-level reader. This allows Queries & Filter to build up logic
> +based on document IDs, albeit the per-segment orientation.
> +
> +There are still valid use-cases where top-level readers ie. "atomic
> +views" on the index are desirable. Let say you want to iterate all terms
> +of a complete index for auto-completion or facetting, Lucene provides
> +utility wrappers like SlowCompositeReaderWrapper (LUCENE-2597) emulating
> +an AtomicReader. Note: using "atomicity emulators" can cause serious
> +slowdowns due to the need to merge terms, postings, DocValues, and
> +FieldCache, use them with care!
> +
> +## LUCENE-2413: Analyzer package changes
> +
> +Lucene's core and contrib analyzers, along with Solr's analyzers,
> +were consolidated into lucene/analysis. During the refactoring some
> +package names have changed:
> +
> +  - o.a.l.analysis.KeywordAnalyzer -> o.a.l.analysis.core.KeywordAnalyzer
> +  - o.a.l.analysis.KeywordTokenizer -> o.a.l.analysis.core.KeywordTokenizer
> +  - o.a.l.analysis.LetterTokenizer -> o.a.l.analysis.core.LetterTokenizer
> +  - o.a.l.analysis.LowerCaseFilter -> o.a.l.analysis.core.LowerCaseFilter
> +  - o.a.l.analysis.LowerCaseTokenizer -> o.a.l.analysis.core.LowerCaseTokenizer
> +  - o.a.l.analysis.SimpleAnalyzer -> o.a.l.analysis.core.SimpleAnalyzer
> +  - o.a.l.analysis.StopAnalyzer -> o.a.l.analysis.core.StopAnalyzer
> +  - o.a.l.analysis.StopFilter -> o.a.l.analysis.core.StopFilter
> +  - o.a.l.analysis.WhitespaceAnalyzer -> o.a.l.analysis.core.WhitespaceAnalyzer
> +  - o.a.l.analysis.WhitespaceTokenizer -> o.a.l.analysis.core.WhitespaceTokenizer
> +  - o.a.l.analysis.PorterStemFilter -> o.a.l.analysis.en.PorterStemFilter
> +  - o.a.l.analysis.ASCIIFoldingFilter -> o.a.l.analysis.miscellaneous.ASCIIFoldingFilter
> +  - o.a.l.analysis.ISOLatin1AccentFilter -> o.a.l.analysis.miscellaneous.ISOLatin1AccentFilter
> +  - o.a.l.analysis.KeywordMarkerFilter -> o.a.l.analysis.miscellaneous.KeywordMarkerFilter
> +  - o.a.l.analysis.LengthFilter -> o.a.l.analysis.miscellaneous.LengthFilter
> +  - o.a.l.analysis.PerFieldAnalyzerWrapper -> o.a.l.analysis.miscellaneous.PerFieldAnalyzerWrapper
> +  - o.a.l.analysis.TeeSinkTokenFilter -> o.a.l.analysis.sinks.TeeSinkTokenFilter
> +  - o.a.l.analysis.CharFilter -> o.a.l.analysis.charfilter.CharFilter
> +  - o.a.l.analysis.BaseCharFilter -> o.a.l.analysis.charfilter.BaseCharFilter
> +  - o.a.l.analysis.MappingCharFilter -> o.a.l.analysis.charfilter.MappingCharFilter
> +  - o.a.l.analysis.NormalizeCharMap -> o.a.l.analysis.charfilter.NormalizeCharMap
> +  - o.a.l.analysis.CharArraySet -> o.a.l.analysis.util.CharArraySet
> +  - o.a.l.analysis.CharArrayMap -> o.a.l.analysis.util.CharArrayMap
> +  - o.a.l.analysis.ReusableAnalyzerBase -> o.a.l.analysis.util.ReusableAnalyzerBase
> +  - o.a.l.analysis.StopwordAnalyzerBase -> o.a.l.analysis.util.StopwordAnalyzerBase
> +  - o.a.l.analysis.WordListLoader -> o.a.l.analysis.util.WordListLoader
> +  - o.a.l.analysis.CharTokenizer -> o.a.l.analysis.util.CharTokenizer
> +  - o.a.l.util.CharacterUtils -> o.a.l.analysis.util.CharacterUtils
> +
> +## LUCENE-2514: Collators
> +
> +The option to use a Collator's order (instead of binary order) for
> +sorting and range queries has been moved to lucene/queries.
> +The Collated TermRangeQuery/Filter has been moved to SlowCollatedTermRangeQuery/Filter,
> +and the collated sorting has been moved to SlowCollatedStringComparator.
> +
> +Note: this functionality isn't very scalable and if you are using it, consider
> +indexing collation keys with the collation support in the analysis module instead.
> +
> +To perform collated range queries, use a suitable collating analyzer: CollationKeyAnalyzer
> +or ICUCollationKeyAnalyzer, and set qp.setAnalyzeRangeTerms(true).
> +
> +TermRangeQuery and TermRangeFilter now work purely on bytes. Both have helper factory methods
> +(newStringRange) similar to the NumericRange API, to easily perform range queries on Strings.
> +
> +## LUCENE-2883: ValueSource changes
> +
> +Lucene's o.a.l.search.function ValueSource based functionality, was consolidated
> +into lucene/queries along with Solr's similar functionality.  The following classes were moved:
> +
> + - o.a.l.search.function.CustomScoreQuery -> o.a.l.queries.CustomScoreQuery
> + - o.a.l.search.function.CustomScoreProvider -> o.a.l.queries.CustomScoreProvider
> + - o.a.l.search.function.NumericIndexDocValueSource -> o.a.l.queries.function.valuesource.NumericIndexDocValueSource
> +
> +The following lists the replacement classes for those removed:
> +
> + - o.a.l.search.function.ByteFieldSource -> o.a.l.queries.function.valuesource.ByteFieldSource
> + - o.a.l.search.function.DocValues -> o.a.l.queries.function.DocValues
> + - o.a.l.search.function.FieldCacheSource -> o.a.l.queries.function.valuesource.FieldCacheSource
> + - o.a.l.search.function.FieldScoreQuery ->o.a.l.queries.function.FunctionQuery
> + - o.a.l.search.function.FloatFieldSource -> o.a.l.queries.function.valuesource.FloatFieldSource
> + - o.a.l.search.function.IntFieldSource -> o.a.l.queries.function.valuesource.IntFieldSource
> + - o.a.l.search.function.OrdFieldSource -> o.a.l.queries.function.valuesource.OrdFieldSource
> + - o.a.l.search.function.ReverseOrdFieldSource -> o.a.l.queries.function.valuesource.ReverseOrdFieldSource
> + - o.a.l.search.function.ShortFieldSource -> o.a.l.queries.function.valuesource.ShortFieldSource
> + - o.a.l.search.function.ValueSource -> o.a.l.queries.function.ValueSource
> + - o.a.l.search.function.ValueSourceQuery -> o.a.l.queries.function.FunctionQuery
> +
> +DocValues are now named FunctionValues, to not confuse with Lucene's per-document values.
> +
> +## LUCENE-2392: Enable flexible scoring
> +
> +The existing "Similarity" api is now TFIDFSimilarity, if you were extending
> +Similarity before, you should likely extend this instead.
> +
> +Weight.normalize no longer takes a norm value that incorporates the top-level
> +boost from outer queries such as BooleanQuery, instead it takes 2 parameters,
> +the outer boost (topLevelBoost) and the norm. Weight.sumOfSquaredWeights has
> +been renamed to Weight.getValueForNormalization().
> +
> +The scorePayload method now takes a BytesRef. It is never null.
> +
> +## LUCENE-3283: Query parsers moved to separate module
> +
> +Lucene's core o.a.l.queryParser QueryParsers have been consolidated into lucene/queryparser,
> +where other QueryParsers from the codebase will also be placed.  The following classes were moved:
>
> -* LUCENE-3283: Lucene's core o.a.l.queryParser QueryParsers have been consolidated into lucene/queryparser,
> -  where other QueryParsers from the codebase will also be placed.  The following classes were moved:
>   - o.a.l.queryParser.CharStream -> o.a.l.queryparser.classic.CharStream
>   - o.a.l.queryParser.FastCharStream -> o.a.l.queryparser.classic.FastCharStream
>   - o.a.l.queryParser.MultiFieldQueryParser -> o.a.l.queryparser.classic.MultiFieldQueryParser
> @@ -480,9 +444,7 @@ LUCENE-1458, LUCENE-2111: Flexible Index
>   - o.a.l.queryParser.QueryParserToken -> o.a.l.queryparser.classic.Token
>   - o.a.l.queryParser.QueryParserTokenMgrError -> o.a.l.queryparser.classic.TokenMgrError
>
> -
> -
> -* LUCENE-2308,LUCENE-3453: Separate IndexableFieldType from Field instances
> +## LUCENE-2308, LUCENE-3453: Separate IndexableFieldType from Field instances
>
>  With this change, the indexing details (indexed, tokenized, norms,
>  indexOptions, stored, etc.) are moved into a separate FieldType
> @@ -498,15 +460,11 @@ Certain field types are pre-defined sinc
>     not tokenize).  This field turns off norms and indexes only doc
>     IDS (does not index term frequency nor positions).  This field
>     does not store its value, but exposes TYPE_STORED as well.
> -
>   * TextField: indexes and tokenizes a String, Reader or TokenStream
>     value, without term vectors.  This field does not store its value,
>     but exposes TYPE_STORED as well.
> -
>   * StoredField: field that stores its value
> -
>   * DocValuesField: indexes the value as a DocValues field
> -
>   * NumericField: indexes the numeric value so that NumericRangeQuery
>     can be used at search-time.
>
> @@ -515,23 +473,22 @@ instantiate the above class.  If you nee
>  add a separate StoredField to the document, or you can use
>  TYPE_STORED for the field:
>
> -  Field f = new Field("field", "value", StringField.TYPE_STORED);
> +    Field f = new Field("field", "value", StringField.TYPE_STORED);
>
>  Alternatively, if an existing type is close to what you want but you
>  need to make a few changes, you can copy that type and make changes:
>
> -  FieldType bodyType = new FieldType(TextField.TYPE_STORED);
> -  bodyType.setStoreTermVectors(true);
> -
> +    FieldType bodyType = new FieldType(TextField.TYPE_STORED);
> +    bodyType.setStoreTermVectors(true);
>
>  You can of course also create your own FieldType from scratch:
>
> -  FieldType t = new FieldType();
> -  t.setIndexed(true);
> -  t.setStored(true);
> -  t.setOmitNorms(true);
> -  t.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
> -  t.freeze();
> +    FieldType t = new FieldType();
> +    t.setIndexed(true);
> +    t.setStored(true);
> +    t.setOmitNorms(true);
> +    t.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
> +    t.freeze();
>
>  FieldType has a freeze() method to prevent further changes.
>
> @@ -541,65 +498,116 @@ enums.
>
>  When migrating from the 3.x API, if you did this before:
>
> -  new Field("field", "value", Field.Store.NO, Field.Indexed.NOT_ANALYZED_NO_NORMS)
> +    new Field("field", "value", Field.Store.NO, Field.Indexed.NOT_ANALYZED_NO_NORMS)
>
>  you can now do this:
>
> -  new StringField("field", "value")
> +    new StringField("field", "value")
>
>  (though note that StringField indexes DOCS_ONLY).
>
>  If instead the value was stored:
>
> -  new Field("field", "value", Field.Store.YES, Field.Indexed.NOT_ANALYZED_NO_NORMS)
> +    new Field("field", "value", Field.Store.YES, Field.Indexed.NOT_ANALYZED_NO_NORMS)
>
>  you can now do this:
>
> -  new Field("field", "value", StringField.TYPE_STORED)
> +    new Field("field", "value", StringField.TYPE_STORED)
>
>  If you didn't omit norms:
>
> -  new Field("field", "value", Field.Store.YES, Field.Indexed.NOT_ANALYZED)
> +    new Field("field", "value", Field.Store.YES, Field.Indexed.NOT_ANALYZED)
>
>  you can now do this:
>
> -  FieldType ft = new FieldType(StringField.TYPE_STORED);
> -  ft.setOmitNorms(false);
> -  new Field("field", "value", ft)
> +    FieldType ft = new FieldType(StringField.TYPE_STORED);
> +    ft.setOmitNorms(false);
> +    new Field("field", "value", ft)
>
>  If you did this before (value can be String or Reader):
>
> -  new Field("field", value, Field.Store.NO, Field.Indexed.ANALYZED)
> +    new Field("field", value, Field.Store.NO, Field.Indexed.ANALYZED)
>
>  you can now do this:
>
> -  new TextField("field", value)
> +    new TextField("field", value)
>
>  If instead the value was stored:
>
> -  new Field("field", value, Field.Store.YES, Field.Indexed.ANALYZED)
> +    new Field("field", value, Field.Store.YES, Field.Indexed.ANALYZED)
>
>  you can now do this:
>
> -  new Field("field", value, TextField.TYPE_STORED)
> +    new Field("field", value, TextField.TYPE_STORED)
>
>  If in addition you omit norms:
>
> -  new Field("field", value, Field.Store.YES, Field.Indexed.ANALYZED_NO_NORMS)
> +    new Field("field", value, Field.Store.YES, Field.Indexed.ANALYZED_NO_NORMS)
>
>  you can now do this:
>
> -  FieldType ft = new FieldType(TextField.TYPE_STORED);
> -  ft.setOmitNorms(true);
> -  new Field("field", value, ft)
> +    FieldType ft = new FieldType(TextField.TYPE_STORED);
> +    ft.setOmitNorms(true);
> +    new Field("field", value, ft)
>
>  If you did this before (bytes is a byte[]):
>
> -  new Field("field", bytes)
> +    new Field("field", bytes)
>
>  you can now do this:
>
> -  new StoredField("field", bytes)
> +    new StoredField("field", bytes)
> +
> +## Other changes
> +
> +* LUCENE-2674:
> +  A new idfExplain method was added to Similarity, that
> +  accepts an incoming docFreq.  If you subclass Similarity, make sure
> +  you also override this method on upgrade, otherwise your
> +  customizations won't run for certain MultiTermQuerys.
> +
> +* LUCENE-2691: The near-real-time API has moved from IndexWriter to
> +  IndexReader.  Instead of IndexWriter.getReader(), call
> +  IndexReader.open(IndexWriter) or IndexReader.reopen(IndexWriter).
> +
> +* LUCENE-2690: MultiTermQuery boolean rewrites per segment.
> +  Also MultiTermQuery.getTermsEnum() now takes an AttributeSource. FuzzyTermsEnum
> +  is both consumer and producer of attributes: MTQ.BoostAttribute is
> +  added to the FuzzyTermsEnum and MTQ's rewrite mode consumes it.
> +  The other way round MTQ.TopTermsBooleanQueryRewrite supplys a
> +  global AttributeSource to each segments TermsEnum. The TermsEnum is consumer
> +  and gets the current minimum competitive boosts (MTQ.MaxNonCompetitiveBoostAttribute).
> +
> +* LUCENE-2374: The backwards layer in AttributeImpl was removed. To support correct
> +  reflection of AttributeImpl instances, where the reflection was done using deprecated
> +  toString() parsing, you have to now override reflectWith() to customize output.
> +  toString() is no longer implemented by AttributeImpl, so if you have overridden
> +  toString(), port your customization over to reflectWith(). reflectAsString() would
> +  then return what toString() did before.
> +
> +* LUCENE-2236, LUCENE-2912: DefaultSimilarity can no longer be set statically
> +  (and dangerously) for the entire JVM.
> +  Similarity can now be configured on a per-field basis (via PerFieldSimilarityWrapper)
> +  Similarity has a lower-level API, if you want the higher-level vector-space API
> +  like in previous Lucene releases, then look at TFIDFSimilarity.
> +
> +* LUCENE-1076: TieredMergePolicy is now the default merge policy.
> +  It's able to merge non-contiguous segments; this may cause problems
> +  for applications that rely on Lucene's internal document ID
> +  assigment.  If so, you should instead use LogByteSize/DocMergePolicy
> +  during indexing.
> +
> +* LUCENE-3722: Similarity methods and collection/term statistics now take
> +  long instead of int (to enable distributed scoring of > 2B docs).
> +  For example, in TFIDFSimilarity idf(int, int) is now idf(long, long).
> +
> +* LUCENE-3559: The methods "docFreq" and "maxDoc" on IndexSearcher were removed,
> +  as these are no longer used by the scoring system.
> +  If you were using these casually in your code for reasons unrelated to scoring,
> +  call them on the IndexSearcher's reader instead: getIndexReader().
> +  If you were subclassing IndexSearcher and overriding these methods to alter
> +  scoring, override IndexSearcher's termStatistics() and collectionStatistics()
> +  methods instead.
>
>  * LUCENE-3396: Analyzer.tokenStream() and .reusableTokenStream() have been made final.
>   It is now necessary to use Analyzer.TokenStreamComponents to define an analysis process.
> @@ -616,7 +624,6 @@ you can now do this:
>   set integer, float and byte values if a single byte is not sufficient.
>
>  * LUCENE-2621: Term vectors are now accessed via flexible indexing API.
> -
>   If you used IndexReader.getTermFreqVector/s before, you should now
>   use IndexReader.getTermVectors.  The new method returns a Fields
>   instance exposing the inverted index of the one document.  From
>
> Modified: lucene/dev/trunk/lucene/README.txt
> URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/README.txt?rev=1328978&r1=1328977&r2=1328978&view=diff
> ==============================================================================
> --- lucene/dev/trunk/lucene/README.txt (original)
> +++ lucene/dev/trunk/lucene/README.txt Sun Apr 22 21:15:27 2012
> @@ -1,52 +1,21 @@
> -Apache Lucene README file
> +# Apache Lucene README file
>
> -INTRODUCTION
> +## Introduction
>
>  Lucene is a Java full-text search engine.  Lucene is not a complete
>  application, but rather a code library and API that can easily be used
>  to add search capabilities to applications.
>
> -The Lucene web site is at:
> -  http://lucene.apache.org/
> + * The Lucene web site is at: http://lucene.apache.org/
> + * Please join the Lucene-User mailing list by sending a message to:
> +   java-user-subscribe@lucene.apache.org
>
> -Please join the Lucene-User mailing list by sending a message to:
> -  java-user-subscribe@lucene.apache.org
> -
> -Files in a binary distribution:
> +## Files in a binary distribution
>
>  Files are organized by module, for example in core/:
>
> -core/lucene-core-XX.jar
> +* `core/lucene-core-XX.jar`:
>   The compiled core Lucene library.
>
> -Additional modules contain the same structure:
> -
> -analysis/common/: Analyzers for indexing content in different languages and domains
> -analysis/icu/: Analysis integration with ICU (International Components for Unicode)
> -analysis/kuromoji/: Analyzer for indexing Japanese
> -analysis/morfologik/: Analyzer for indexing Polish
> -analysis/phonetic/: Analyzer for indexing phonetic signatures (for sounds-alike search)
> -analysis/smartcn/: Analyzer for indexing Chinese
> -analysis/stempel/: Analyzer for indexing Polish
> -analysis/uima/: Analysis integration with Apache UIMA
> -benchmark/: System for benchmarking Lucene
> -demo/: Simple example code
> -facet/: Faceted indexing and search capabilities
> -grouping/: Search result grouping
> -highlighter/: Highlights search keywords in results
> -join/: Index-time and Query-time joins for normalized content
> -memory/: Single-document in memory index implementation
> -misc/: Index tools and other miscellaneous code
> -queries/: Filters and Queries that add to core Lucene
> -queryparser/: Query parsers and parsing framework
> -sandbox/: Various third party contributions and new ideas.
> -spatial/: Geospatial search
> -suggest/: Auto-suggest and Spellchecking support
> -test-framework/:  Test Framework for testing Lucene-based applications
> -
> -docs/index.html
> -  The contents of the Lucene website.
> -
> -docs/api/index.html
> -  The Javadoc Lucene API documentation.  This includes the core library,
> -  the test framework, and the demo, as well as all other modules.
> +To review the documentation, read the main documentation page, located at:
> +`docs/index.html`
>
> Modified: lucene/dev/trunk/lucene/build.xml
> URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/build.xml?rev=1328978&r1=1328977&r2=1328978&view=diff
> ==============================================================================
> --- lucene/dev/trunk/lucene/build.xml (original)
> +++ lucene/dev/trunk/lucene/build.xml Sun Apr 22 21:15:27 2012
> @@ -184,11 +184,11 @@
>   </target>
>
>   <target name="documentation" description="Generate all documentation"
> -    depends="javadocs,changes-to-html,doc-index"/>
> +    depends="javadocs,changes-to-html,process-webpages"/>
>   <target name="javadoc" depends="javadocs"/>
>   <target name="javadocs" description="Generate javadoc" depends="javadocs-lucene-core, javadocs-modules, javadocs-test-framework"/>
>
> -  <target name="doc-index">
> +  <target name="process-webpages" depends="resolve-pegdown">
>     <pathconvert pathsep="|" dirsep="/" property="buildfiles">
>       <fileset dir="." includes="**/build.xml" excludes="build.xml,analysis/*,build/**,tools/**,backwards/**,site/**"/>
>     </pathconvert>
> @@ -205,6 +205,12 @@
>       <param name="buildfiles" expression="${buildfiles}"/>
>       <param name="version" expression="${version}"/>
>     </xslt>
> +
> +    <pegdown todir="${javadoc.dir}">
> +      <fileset dir="." includes="MIGRATE.txt,JRE_VERSION_MIGRATION.txt"/>
> +      <globmapper from="*.txt" to="*.html"/>
> +    </pegdown>
> +
>     <copy todir="${javadoc.dir}">
>       <fileset dir="site/html" includes="**/*"/>
>     </copy>
>
> Modified: lucene/dev/trunk/lucene/common-build.xml
> URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/common-build.xml?rev=1328978&r1=1328977&r2=1328978&view=diff
> ==============================================================================
> --- lucene/dev/trunk/lucene/common-build.xml (original)
> +++ lucene/dev/trunk/lucene/common-build.xml Sun Apr 22 21:15:27 2012
> @@ -1506,4 +1506,60 @@ ${tests-output}/junit4-*.suites     - pe
>       </scp>
>     </sequential>
>   </macrodef>
> +
> +  <!-- PEGDOWN macro: Before using depend on the target "resolve-pegdown" -->
> +
> +  <target name="resolve-pegdown" unless="pegdown.loaded">
> +    <ivy:cachepath organisation="org.pegdown" module="pegdown" revision="1.1.0"
> +      inline="true" conf="default" type="jar" transitive="true" pathid="pegdown.classpath"/>
> +    <property name="pegdown.loaded" value="true"/>
> +  </target>
> +
> +  <macrodef name="pegdown">
> +    <attribute name="todir"/>
> +    <attribute name="flatten" default="false"/>
> +    <attribute name="overwrite" default="false"/>
> +    <element name="nested" optional="false" implicit="true"/>
> +    <sequential>
> +      <copy todir="@{todir}" flatten="@{flatten}" overwrite="@{overwrite}" verbose="true"
> +        preservelastmodified="false" encoding="UTF-8" outputencoding="UTF-8"
> +      >
> +        <filterchain>
> +          <tokenfilter>
> +            <filetokenizer/>
> +            <replaceregex pattern="\b(LUCENE|SOLR)\-\d+\b" replace="[\0](https://issues.apache.org/jira/browse/\0)" flags="gs"/>
> +            <scriptfilter language="javascript" classpathref="pegdown.classpath"><![CDATA[
> +              importClass(java.lang.StringBuilder);
> +              importClass(org.pegdown.PegDownProcessor);
> +              importClass(org.pegdown.Extensions);
> +              importClass(org.pegdown.FastEncoder);
> +              var markdownSource = self.getToken();
> +              var title = undefined;
> +              if (markdownSource.search(/^(#+\s*)?(.+)[\n\r]/) == 0) {
> +                title = RegExp.$2;
> +                // Convert the first line into a markdown heading, if it is not already:
> +                if (RegExp.$1 == '') {
> +                  markdownSource = '# ' + markdownSource;
> +                }
> +              }
> +              var processor = new PegDownProcessor(
> +                Extensions.ABBREVIATIONS | Extensions.AUTOLINKS |
> +                Extensions.FENCED_CODE_BLOCKS | Extensions.SMARTS
> +              );
> +              var html = new StringBuilder('<html>\n<head>\n');
> +              if (title) {
> +                html.append('<title>').append(FastEncoder.encode(title)).append('</title>\n');
> +              }
> +              html.append('<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\n')
> +                .append('</head>\n<body>\n')
> +                .append(processor.markdownToHtml(markdownSource))
> +                .append('\n</body>\n</html>\n');
> +              self.setToken(html.toString());
> +            ]]></scriptfilter>
> +          </tokenfilter>
> +        </filterchain>
> +        <nested/>
> +      </copy>
> +    </sequential>
> +  </macrodef>
>  </project>
>
> Modified: lucene/dev/trunk/lucene/site/xsl/index.xsl
> URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/site/xsl/index.xsl?rev=1328978&r1=1328977&r2=1328978&view=diff
> ==============================================================================
> --- lucene/dev/trunk/lucene/site/xsl/index.xsl (original)
> +++ lucene/dev/trunk/lucene/site/xsl/index.xsl Sun Apr 22 21:15:27 2012
> @@ -37,11 +37,14 @@
>       <body>
>         <div><img src="lucene_green_300.gif"/></div>
>         <h1><xsl:text>Apache Lucene </xsl:text><xsl:value-of select="$version"/><xsl:text> Documentation</xsl:text></h1>
> +        <p>Lucene is a Java full-text search engine. Lucene is not a complete application,
> +        but rather a code library and API that can easily be used to add search capabilities
> +        to applications.</p>
>         <p>
>           This is the official documentation for <b><xsl:text>Apache Lucene </xsl:text>
>           <xsl:value-of select="$version"/></b>. Additional documentation is available in the
>           <a href="http://wiki.apache.org/lucene-java">Wiki</a>.
> -        </p>
> +        </p>
>         <h2>Getting Started</h2>
>         <p>The following section is intended as a "getting started" guide. It has three
>         audiences: first-time users looking to install Apache Lucene in their
> @@ -60,6 +63,8 @@
>         <h2>Reference Documents</h2>
>           <ul>
>             <li><a href="changes/Changes.html">Changes</a>: List of changes in this release.</li>
> +            <li><a href="MIGRATE.html">Migration Guide</a>: What changed in Lucene 4; how to migrate code from Lucene 3.x.</li>
> +            <li><a href="JRE_VERSION_MIGRATION.html">JRE Version Migration</a>: Information about upgrading between major JRE versions.</li>
>             <li><a href="fileformats.html">File Formats</a>: Guide to the index format used by Lucene.</li>
>             <li><a href="core/org/apache/lucene/search/package-summary.html#package_description">Search and Scoring in Lucene</a>: Introduction to how Lucene scores documents.</li>
>             <li><a href="core/org/apache/lucene/search/similarities/TFIDFSimilarity.html">Classic Scoring Formula</a>: Formula of Lucene's classic <a href="http://en.wikipedia.org/wiki/Vector_Space_Model">Vector Space</a> implementation. (look <a href="core/org/apache/lucene/search/similarities/package-summary.html#package_description">here</a> for other models)</li>
>
>



-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org