You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by rm...@apache.org on 2013/02/17 02:16:53 UTC
svn commit: r1446988 - in
/lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs:
lucene41/package.html lucene42/Lucene42DocValuesFormat.java
lucene42/package.html
Author: rmuir
Date: Sun Feb 17 01:16:53 2013
New Revision: 1446988
URL: http://svn.apache.org/r1446988
Log:
file formats
Modified:
lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene41/package.html
lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene42/Lucene42DocValuesFormat.java
lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene42/package.html
Modified: lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene41/package.html
URL: http://svn.apache.org/viewvc/lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene41/package.html?rev=1446988&r1=1446987&r2=1446988&view=diff
==============================================================================
--- lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene41/package.html (original)
+++ lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene41/package.html Sun Feb 17 01:16:53 2013
@@ -375,7 +375,8 @@ can optionally be indexed into the posti
term vectors.</li>
<li>In version 4.1, the format of the postings list changed to use either
of FOR compression or variable-byte encoding, depending upon the frequency
-of the term.</li>
+of the term. Terms appearing only once were changed to inline directly into
+the term dictionary. Stored fields are compressed by default. </li>
</ul>
<a name="Limitations" id="Limitations"></a>
<h2>Limitations</h2>
Modified: lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene42/Lucene42DocValuesFormat.java
URL: http://svn.apache.org/viewvc/lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene42/Lucene42DocValuesFormat.java?rev=1446988&r1=1446987&r2=1446988&view=diff
==============================================================================
--- lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene42/Lucene42DocValuesFormat.java (original)
+++ lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene42/Lucene42DocValuesFormat.java Sun Feb 17 01:16:53 2013
@@ -34,7 +34,7 @@ import org.apache.lucene.util.packed.Blo
/**
* Lucene 4.2 DocValues format.
* <p>
- * Encodes the three per-document value types (Numeric,Binary,Sorted) with five basic strategies.
+ * Encodes the four per-document value types (Numeric,Binary,Sorted,SortedSet) with seven basic strategies.
* <p>
* <ul>
* <li>Delta-compressed Numerics: per-document integers written in blocks of 4096. For each block
@@ -51,7 +51,9 @@ import org.apache.lucene.util.packed.Blo
* start for the block, and the average (expected) delta per entry. For each document the
* deviation from the delta (actual - expected) is written.
* <li>Sorted: an FST mapping deduplicated terms to ordinals is written, along with the per-document
- * ordinals written using one of the numeric stratgies above.
+ * ordinals written using one of the numeric strategies above.
+ * <li>SortedSet: an FST mapping deduplicated terms to ordinals is written, along with the per-document
+ * ordinal list written using one of the binary strategies above.
* </ul>
* <p>
* Files:
@@ -77,6 +79,8 @@ import org.apache.lucene.util.packed.Blo
* </ul>
* <p>Sorted fields have two entries: a SortedEntry with the FST metadata,
* and an ordinary NumericEntry for the document-to-ord metadata.</p>
+ * <p>SortedSet fields have two entries: a SortedEntry with the FST metadata,
+ * and an ordinary BinaryEntry for the document-to-ord-list metadata.</p>
* <p>FieldNumber of -1 indicates the end of metadata.</p>
* <p>EntryType is a 0 (NumericEntry), 1 (BinaryEntry, or 2 (SortedEntry)</p>
* <p>DataOffset is the pointer to the start of the data in the DocValues data (.dvd)</p>
@@ -107,6 +111,8 @@ import org.apache.lucene.util.packed.Blo
* <li>UncompressedNumerics --> {@link DataOutput#writeByte Byte}<sup>maxdoc</sup></li>
* <li>Addresses --> {@link MonotonicBlockPackedWriter MonotonicBlockPackedInts(blockSize=4096)}</li>
* </ul>
+ * <p>SortedSet entries store the list of ordinals in their BinaryData as a
+ * sequences of increasing {@link DataOutput#writeVLong vLong}s, delta-encoded.</p>
* </ol>
*/
public final class Lucene42DocValuesFormat extends DocValuesFormat {
Modified: lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene42/package.html
URL: http://svn.apache.org/viewvc/lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene42/package.html?rev=1446988&r1=1446987&r2=1446988&view=diff
==============================================================================
--- lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene42/package.html (original)
+++ lucene/dev/branches/lucene4765/lucene/core/src/java/org/apache/lucene/codecs/lucene42/package.html Sun Feb 17 01:16:53 2013
@@ -375,7 +375,11 @@ can optionally be indexed into the posti
term vectors.</li>
<li>In version 4.1, the format of the postings list changed to use either
of FOR compression or variable-byte encoding, depending upon the frequency
-of the term.</li>
+of the term. Terms appearing only once were changed to inline directly into
+the term dictionary. Stored fields are compressed by default. </li>
+<li>In version 4.2, term vectors are compressed by default. DocValues has
+a new multi-valued type (SortedSet), that can be used for faceting/grouping/joining
+on multi-valued fields.</li>
</ul>
<a name="Limitations" id="Limitations"></a>
<h2>Limitations</h2>