You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by si...@apache.org on 2011/03/28 12:50:48 UTC
svn commit: r1086181 [4/20] - in /lucene/dev/branches/docvalues: ./ dev-tools/eclipse/ dev-tools/idea/ dev-tools/idea/.idea/ dev-tools/idea/.idea/libraries/ dev-tools/idea/lucene/ dev-tools/idea/solr/ dev-tools/idea/solr/contrib/analysis-extras/ dev-to...

Modified: lucene/dev/branches/docvalues/lucene/docs/fileformats.html
URL: http://svn.apache.org/viewvc/lucene/dev/branches/docvalues/lucene/docs/fileformats.html?rev=1086181&r1=1086180&r2=1086181&view=diff
==============================================================================
--- lucene/dev/branches/docvalues/lucene/docs/fileformats.html (original)
+++ lucene/dev/branches/docvalues/lucene/docs/fileformats.html Mon Mar 28 10:50:28 2011
@@ -129,8 +129,11 @@ document.write("Last Published: " + docu
 <div class="menuitem">
 <a href="api/core/index.html">Core</a>
 </div>
-<div onclick="SwitchMenu('menu_1.1.3.3', 'skin/')" id="menu_1.1.3.3Title" class="menutitle">Contrib</div>
-<div id="menu_1.1.3.3" class="menuitemgroup">
+<div class="menuitem">
+<a href="api/test-framework/index.html">Test Framework</a>
+</div>
+<div onclick="SwitchMenu('menu_1.1.3.4', 'skin/')" id="menu_1.1.3.4Title" class="menutitle">Contrib</div>
+<div id="menu_1.1.3.4" class="menuitemgroup">
 <div class="menuitem">
 <a href="api/contrib-ant/index.html">Ant</a>
 </div>
@@ -419,10 +422,14 @@ document.write("Last Published: " + docu
 	        merge the new segment will write them,
 	        uncompressed). See issue LUCENE-1960 for details.
             </p>
+<p>
+            In version 3.1, segments records the code version
+            that created them. See LUCENE-2720 for details.
+           </p>
 </div>
 
         
-<a name="N10034"></a><a name="Definitions"></a>
+<a name="N10037"></a><a name="Definitions"></a>
 <h2 class="boxed">Definitions</h2>
 <div class="section">
 <p>
@@ -463,7 +470,7 @@ document.write("Last Published: " + docu
                 strings, the first naming the field, and the second naming text
                 within the field.
             </p>
-<a name="N10054"></a><a name="Inverted Indexing"></a>
+<a name="N10057"></a><a name="Inverted Indexing"></a>
 <h3 class="boxed">Inverted Indexing</h3>
 <p>
                     The index stores statistics about terms in order
@@ -473,7 +480,7 @@ document.write("Last Published: " + docu
                     it.  This is the inverse of the natural relationship, in which
                     documents list terms.
                 </p>
-<a name="N10060"></a><a name="Types of Fields"></a>
+<a name="N10063"></a><a name="Types of Fields"></a>
 <h3 class="boxed">Types of Fields</h3>
 <p>
                     In Lucene, fields may be <i>stored</i>, in which
@@ -487,7 +494,7 @@ document.write("Last Published: " + docu
                     to be indexed literally.
                 </p>
 <p>See the <a href="api/core/org/apache/lucene/document/Field.html">Field</a> java docs for more information on Fields.</p>
-<a name="N1007D"></a><a name="Segments"></a>
+<a name="N10080"></a><a name="Segments"></a>
 <h3 class="boxed">Segments</h3>
 <p>
                     Lucene indexes may be composed of multiple sub-indexes, or
@@ -513,7 +520,7 @@ document.write("Last Published: " + docu
                     Searches may involve multiple segments and/or multiple indexes, each
                     index potentially composed of a set of segments.
                 </p>
-<a name="N1009B"></a><a name="Document Numbers"></a>
+<a name="N1009E"></a><a name="Document Numbers"></a>
 <h3 class="boxed">Document Numbers</h3>
 <p>
                     Internally, Lucene refers to documents by an integer <i>document
@@ -568,7 +575,7 @@ document.write("Last Published: " + docu
 </div>
 
         
-<a name="N100C2"></a><a name="Overview"></a>
+<a name="N100C5"></a><a name="Overview"></a>
 <h2 class="boxed">Overview</h2>
 <div class="section">
 <p>
@@ -667,7 +674,7 @@ document.write("Last Published: " + docu
 </div>
 
         
-<a name="N10105"></a><a name="File Naming"></a>
+<a name="N10108"></a><a name="File Naming"></a>
 <h2 class="boxed">File Naming</h2>
 <div class="section">
 <p>
@@ -694,7 +701,7 @@ document.write("Last Published: " + docu
             </p>
 </div>
       
-<a name="N10114"></a><a name="file-names"></a>
+<a name="N10117"></a><a name="file-names"></a>
 <h2 class="boxed">Summary of File Extensions</h2>
 <div class="section">
 <p>The following table summarizes the names and extensions of the files in Lucene:
@@ -836,10 +843,10 @@ document.write("Last Published: " + docu
 </div>
 
         
-<a name="N101FE"></a><a name="Primitive Types"></a>
+<a name="N10201"></a><a name="Primitive Types"></a>
 <h2 class="boxed">Primitive Types</h2>
 <div class="section">
-<a name="N10203"></a><a name="Byte"></a>
+<a name="N10206"></a><a name="Byte"></a>
 <h3 class="boxed">Byte</h3>
 <p>
                     The most primitive type
@@ -847,7 +854,7 @@ document.write("Last Published: " + docu
                     other data types are defined as sequences
                     of bytes, so file formats are byte-order independent.
                 </p>
-<a name="N1020C"></a><a name="UInt32"></a>
+<a name="N1020F"></a><a name="UInt32"></a>
 <h3 class="boxed">UInt32</h3>
 <p>
                     32-bit unsigned integers are written as four
@@ -857,7 +864,7 @@ document.write("Last Published: " + docu
                     UInt32    --&gt; &lt;Byte&gt;<sup>4</sup>
                 
 </p>
-<a name="N1021B"></a><a name="Uint64"></a>
+<a name="N1021E"></a><a name="Uint64"></a>
 <h3 class="boxed">Uint64</h3>
 <p>
                     64-bit unsigned integers are written as eight
@@ -866,7 +873,7 @@ document.write("Last Published: " + docu
 <p>UInt64    --&gt; &lt;Byte&gt;<sup>8</sup>
                 
 </p>
-<a name="N1022A"></a><a name="VInt"></a>
+<a name="N1022D"></a><a name="VInt"></a>
 <h3 class="boxed">VInt</h3>
 <p>
                     A variable-length format for positive integers is
@@ -1416,13 +1423,13 @@ document.write("Last Published: " + docu
                     This provides compression while still being
                     efficient to decode.
                 </p>
-<a name="N1050F"></a><a name="Chars"></a>
+<a name="N10512"></a><a name="Chars"></a>
 <h3 class="boxed">Chars</h3>
 <p>
                     Lucene writes unicode
                     character sequences as UTF-8 encoded bytes.
                 </p>
-<a name="N10518"></a><a name="String"></a>
+<a name="N1051B"></a><a name="String"></a>
 <h3 class="boxed">String</h3>
 <p>
 		    Lucene writes strings as UTF-8 encoded bytes.
@@ -1435,10 +1442,10 @@ document.write("Last Published: " + docu
 </div>
 
         
-<a name="N10525"></a><a name="Compound Types"></a>
+<a name="N10528"></a><a name="Compound Types"></a>
 <h2 class="boxed">Compound Types</h2>
 <div class="section">
-<a name="N1052A"></a><a name="MapStringString"></a>
+<a name="N1052D"></a><a name="MapStringString"></a>
 <h3 class="boxed">Map&lt;String,String&gt;</h3>
 <p>
 		    In a couple places Lucene stores a Map
@@ -1451,13 +1458,13 @@ document.write("Last Published: " + docu
 </div>
 
         
-<a name="N1053A"></a><a name="Per-Index Files"></a>
+<a name="N1053D"></a><a name="Per-Index Files"></a>
 <h2 class="boxed">Per-Index Files</h2>
 <div class="section">
 <p>
                 The files in this section exist one-per-index.
             </p>
-<a name="N10542"></a><a name="Segments File"></a>
+<a name="N10545"></a><a name="Segments File"></a>
 <h3 class="boxed">Segments File</h3>
 <p>
                     The active segments in the index are stored in the
@@ -1498,8 +1505,8 @@ document.write("Last Published: " + docu
                 </p>
 <p>
                     
-<b>2.9</b>
-                    Segments --&gt; Format, Version, NameCounter, SegCount, &lt;SegName, SegSize, DelGen, DocStoreOffset, [DocStoreSegment, DocStoreIsCompoundFile], HasSingleNormFile, NumField,
+<b>3.1</b>
+                    Segments --&gt; Format, Version, NameCounter, SegCount, &lt;SegVersion, SegName, SegSize, DelGen, DocStoreOffset, [DocStoreSegment, DocStoreIsCompoundFile], HasSingleNormFile, NumField,
                     NormGen<sup>NumField</sup>,
                     IsCompoundFile, DeletionCount, HasProx, Diagnostics&gt;<sup>SegCount</sup>, CommitUserData, Checksum
                 </p>
@@ -1511,7 +1518,7 @@ document.write("Last Published: " + docu
                     Version, DelGen, NormGen, Checksum --&gt; Int64
                 </p>
 <p>
-                   SegName, DocStoreSegment --&gt; String
+                   SegVersion, SegName, DocStoreSegment --&gt; String
                 </p>
 <p>
 		   Diagnostics --&gt; Map&lt;String,String&gt;
@@ -1534,6 +1541,9 @@ document.write("Last Published: " + docu
                     NameCounter is used to generate names for new segment files.
                 </p>
 <p>
+                    SegVersion is the code version that created the segment.
+                </p>
+<p>
                     SegName is the name of the segment, and is used as the file name prefix
                     for all of the files that compose the segment's index.
                 </p>
@@ -1624,7 +1634,7 @@ document.write("Last Published: " + docu
 		    Lucene version, OS, Java version, why the segment
 		    was created (merge, flush, addIndexes), etc.
                 </p>
-<a name="N105C7"></a><a name="Lock File"></a>
+<a name="N105CD"></a><a name="Lock File"></a>
 <h3 class="boxed">Lock File</h3>
 <p>
                     The write lock, which is stored in the index
@@ -1638,14 +1648,14 @@ document.write("Last Published: " + docu
                     documents).  This lock file ensures that only one
                     writer is modifying the index at a time.
                 </p>
-<a name="N105D0"></a><a name="Deletable File"></a>
+<a name="N105D6"></a><a name="Deletable File"></a>
 <h3 class="boxed">Deletable File</h3>
 <p>
                     A writer dynamically computes
                     the files that are deletable, instead, so no file
                     is written.
                 </p>
-<a name="N105D9"></a><a name="Compound Files"></a>
+<a name="N105DF"></a><a name="Compound Files"></a>
 <h3 class="boxed">Compound Files</h3>
 <p>Starting with Lucene 1.4 the compound file format became default. This
                     is simply a container for all files described in the next section
@@ -1672,14 +1682,14 @@ document.write("Last Published: " + docu
 </div>
 
         
-<a name="N10601"></a><a name="Per-Segment Files"></a>
+<a name="N10607"></a><a name="Per-Segment Files"></a>
 <h2 class="boxed">Per-Segment Files</h2>
 <div class="section">
 <p>
                 The remaining files are all per-segment, and are
                 thus defined by suffix.
             </p>
-<a name="N10609"></a><a name="Fields"></a>
+<a name="N1060F"></a><a name="Fields"></a>
 <h3 class="boxed">Fields</h3>
 <p>
                     
@@ -1873,7 +1883,7 @@ document.write("Last Published: " + docu
 </li>
                 
 </ol>
-<a name="N106B0"></a><a name="Term Dictionary"></a>
+<a name="N106B6"></a><a name="Term Dictionary"></a>
 <h3 class="boxed">Term Dictionary</h3>
 <p>
                     The term dictionary is represented as two files:
@@ -2065,7 +2075,7 @@ document.write("Last Published: " + docu
 </li>
                 
 </ol>
-<a name="N10734"></a><a name="Frequencies"></a>
+<a name="N1073A"></a><a name="Frequencies"></a>
 <h3 class="boxed">Frequencies</h3>
 <p>
                     The .frq file contains the lists of documents
@@ -2193,7 +2203,7 @@ document.write("Last Published: " + docu
                    entry in level-1. In the example has entry 15 on level 1 a pointer to entry 15 on level 0 and entry 31 on level 1 a pointer
                    to entry 31 on level 0.                   
                 </p>
-<a name="N107BC"></a><a name="Positions"></a>
+<a name="N107C2"></a><a name="Positions"></a>
 <h3 class="boxed">Positions</h3>
 <p>
                     The .prx file contains the lists of positions that
@@ -2263,7 +2273,7 @@ document.write("Last Published: " + docu
                     Payload. If PayloadLength is not stored, then this Payload has the same
                     length as the Payload at the previous position.
                 </p>
-<a name="N107F8"></a><a name="Normalization Factors"></a>
+<a name="N107FE"></a><a name="Normalization Factors"></a>
 <h3 class="boxed">Normalization Factors</h3>
 <p>There's a single .nrm file containing all norms:
                 </p>
@@ -2343,7 +2353,7 @@ document.write("Last Published: " + docu
                 </p>
 <p>Separate norm files are created (when adequate) for both compound and non compound segments.
                 </p>
-<a name="N10849"></a><a name="Term Vectors"></a>
+<a name="N1084F"></a><a name="Term Vectors"></a>
 <h3 class="boxed">Term Vectors</h3>
 <p>
 		  Term Vector support is an optional on a field by
@@ -2479,7 +2489,7 @@ document.write("Last Published: " + docu
 </li>
                 
 </ol>
-<a name="N108E5"></a><a name="Deleted Documents"></a>
+<a name="N108EB"></a><a name="Deleted Documents"></a>
 <h3 class="boxed">Deleted Documents</h3>
 <p>The .del file is
                     optional, and only exists when a segment contains deletions.
@@ -2543,7 +2553,7 @@ document.write("Last Published: " + docu
 </div>
 
         
-<a name="N1091F"></a><a name="Limitations"></a>
+<a name="N10925"></a><a name="Limitations"></a>
 <h2 class="boxed">Limitations</h2>
 <div class="section">
 <p>