You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by rm...@apache.org on 2013/02/17 02:16:53 UTC

svn commit: r1446988 - in /lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs: lucene41/package.html lucene42/Lucene42DocValuesFormat.java lucene42/package.html

Author: rmuir
Date: Sun Feb 17 01:16:53 2013
New Revision: 1446988

URL: http://svn.apache.org/r1446988
Log:
file formats

Modified:
    lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene41/package.html
    lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene42/Lucene42DocValuesFormat.java
    lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene42/package.html

Modified: lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene41/package.html
URL: http://svn.apache.org/viewvc/lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene41/package.html?rev=1446988&r1=1446987&r2=1446988&view=diff
==============================================================================
--- lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene41/package.html (original)
+++ lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene41/package.html Sun Feb 17 01:16:53 2013
@@ -375,7 +375,8 @@ can optionally be indexed into the posti
 term vectors.</li>
 <li>In version 4.1, the format of the postings list changed to use either
 of FOR compression or variable-byte encoding, depending upon the frequency
-of the term.</li>
+of the term. Terms appearing only once were changed to inline directly into
+the term dictionary. Stored fields are compressed by default. </li>
 </ul>
 <a name="Limitations" id="Limitations"></a>
 <h2>Limitations</h2>

Modified: lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene42/Lucene42DocValuesFormat.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene42/Lucene42DocValuesFormat.java?rev=1446988&r1=1446987&r2=1446988&view=diff
==============================================================================
--- lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene42/Lucene42DocValuesFormat.java (original)
+++ lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene42/Lucene42DocValuesFormat.java Sun Feb 17 01:16:53 2013
@@ -34,7 +34,7 @@ import org.apache.lucene.util.packed.Blo
 /**
  * Lucene 4.2 DocValues format.
  * <p>
- * Encodes the three per-document value types (Numeric,Binary,Sorted) with five basic strategies.
+ * Encodes the four per-document value types (Numeric,Binary,Sorted,SortedSet) with seven basic strategies.
  * <p>
  * <ul>
  *    <li>Delta-compressed Numerics: per-document integers written in blocks of 4096. For each block
@@ -51,7 +51,9 @@ import org.apache.lucene.util.packed.Blo
  *        start for the block, and the average (expected) delta per entry. For each document the 
  *        deviation from the delta (actual - expected) is written.
  *    <li>Sorted: an FST mapping deduplicated terms to ordinals is written, along with the per-document
- *        ordinals written using one of the numeric stratgies above.
+ *        ordinals written using one of the numeric strategies above.
+ *    <li>SortedSet: an FST mapping deduplicated terms to ordinals is written, along with the per-document
+ *        ordinal list written using one of the binary strategies above.  
  * </ul>
  * <p>
  * Files:
@@ -77,6 +79,8 @@ import org.apache.lucene.util.packed.Blo
  *   </ul>
  *   <p>Sorted fields have two entries: a SortedEntry with the FST metadata,
  *      and an ordinary NumericEntry for the document-to-ord metadata.</p>
+ *   <p>SortedSet fields have two entries: a SortedEntry with the FST metadata,
+ *      and an ordinary BinaryEntry for the document-to-ord-list metadata.</p>
  *   <p>FieldNumber of -1 indicates the end of metadata.</p>
  *   <p>EntryType is a 0 (NumericEntry), 1 (BinaryEntry, or 2 (SortedEntry)</p>
  *   <p>DataOffset is the pointer to the start of the data in the DocValues data (.dvd)</p>
@@ -107,6 +111,8 @@ import org.apache.lucene.util.packed.Blo
  *     <li>UncompressedNumerics --&gt; {@link DataOutput#writeByte Byte}<sup>maxdoc</sup></li>
  *     <li>Addresses --&gt; {@link MonotonicBlockPackedWriter MonotonicBlockPackedInts(blockSize=4096)}</li>
  *   </ul>
+ *   <p>SortedSet entries store the list of ordinals in their BinaryData as a
+ *      sequences of increasing {@link DataOutput#writeVLong vLong}s, delta-encoded.</p>       
  * </ol>
  */
 public final class Lucene42DocValuesFormat extends DocValuesFormat {

Modified: lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene42/package.html
URL: http://svn.apache.org/viewvc/lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene42/package.html?rev=1446988&r1=1446987&r2=1446988&view=diff
==============================================================================
--- lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene42/package.html (original)
+++ lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene42/package.html Sun Feb 17 01:16:53 2013
@@ -375,7 +375,11 @@ can optionally be indexed into the posti
 term vectors.</li>
 <li>In version 4.1, the format of the postings list changed to use either
 of FOR compression or variable-byte encoding, depending upon the frequency
-of the term.</li>
+of the term. Terms appearing only once were changed to inline directly into
+the term dictionary. Stored fields are compressed by default. </li>
+<li>In version 4.2, term vectors are compressed by default. DocValues has 
+a new multi-valued type (SortedSet), that can be used for faceting/grouping/joining
+on multi-valued fields.</li>
 </ul>
 <a name="Limitations" id="Limitations"></a>
 <h2>Limitations</h2>