You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by rm...@apache.org on 2011/03/02 23:14:32 UTC

svn commit: r1076433 [1/3] - in /lucene/dev/trunk/lucene: docs/fileformats.html docs/fileformats.pdf src/site/src/documentation/content/xdocs/fileformats.xml

Author: rmuir
Date: Wed Mar  2 22:14:32 2011
New Revision: 1076433

URL: http://svn.apache.org/viewvc?rev=1076433&view=rev
Log:
LUCENE-2720: update fileformats

Modified:
    lucene/dev/trunk/lucene/docs/fileformats.html
    lucene/dev/trunk/lucene/docs/fileformats.pdf
    lucene/dev/trunk/lucene/src/site/src/documentation/content/xdocs/fileformats.xml

Modified: lucene/dev/trunk/lucene/docs/fileformats.html
URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/docs/fileformats.html?rev=1076433&r1=1076432&r2=1076433&view=diff
==============================================================================
--- lucene/dev/trunk/lucene/docs/fileformats.html (original)
+++ lucene/dev/trunk/lucene/docs/fileformats.html Wed Mar  2 22:14:32 2011
@@ -422,10 +422,14 @@ document.write("Last Published: " + docu
 	        merge the new segment will write them,
 	        uncompressed). See issue LUCENE-1960 for details.
             </p>
+<p>
+            In version 3.1, segments records the code version
+            that created them. See LUCENE-2720 for details.
+           </p>
 </div>
 
         
-<a name="N10034"></a><a name="Definitions"></a>
+<a name="N10037"></a><a name="Definitions"></a>
 <h2 class="boxed">Definitions</h2>
 <div class="section">
 <p>
@@ -466,7 +470,7 @@ document.write("Last Published: " + docu
                 strings, the first naming the field, and the second naming text
                 within the field.
             </p>
-<a name="N10054"></a><a name="Inverted Indexing"></a>
+<a name="N10057"></a><a name="Inverted Indexing"></a>
 <h3 class="boxed">Inverted Indexing</h3>
 <p>
                     The index stores statistics about terms in order
@@ -476,7 +480,7 @@ document.write("Last Published: " + docu
                     it.  This is the inverse of the natural relationship, in which
                     documents list terms.
                 </p>
-<a name="N10060"></a><a name="Types of Fields"></a>
+<a name="N10063"></a><a name="Types of Fields"></a>
 <h3 class="boxed">Types of Fields</h3>
 <p>
                     In Lucene, fields may be <i>stored</i>, in which
@@ -490,7 +494,7 @@ document.write("Last Published: " + docu
                     to be indexed literally.
                 </p>
 <p>See the <a href="api/core/org/apache/lucene/document/Field.html">Field</a> java docs for more information on Fields.</p>
-<a name="N1007D"></a><a name="Segments"></a>
+<a name="N10080"></a><a name="Segments"></a>
 <h3 class="boxed">Segments</h3>
 <p>
                     Lucene indexes may be composed of multiple sub-indexes, or
@@ -516,7 +520,7 @@ document.write("Last Published: " + docu
                     Searches may involve multiple segments and/or multiple indexes, each
                     index potentially composed of a set of segments.
                 </p>
-<a name="N1009B"></a><a name="Document Numbers"></a>
+<a name="N1009E"></a><a name="Document Numbers"></a>
 <h3 class="boxed">Document Numbers</h3>
 <p>
                     Internally, Lucene refers to documents by an integer <i>document
@@ -571,7 +575,7 @@ document.write("Last Published: " + docu
 </div>
 
         
-<a name="N100C2"></a><a name="Overview"></a>
+<a name="N100C5"></a><a name="Overview"></a>
 <h2 class="boxed">Overview</h2>
 <div class="section">
 <p>
@@ -670,7 +674,7 @@ document.write("Last Published: " + docu
 </div>
 
         
-<a name="N10105"></a><a name="File Naming"></a>
+<a name="N10108"></a><a name="File Naming"></a>
 <h2 class="boxed">File Naming</h2>
 <div class="section">
 <p>
@@ -697,7 +701,7 @@ document.write("Last Published: " + docu
             </p>
 </div>
       
-<a name="N10114"></a><a name="file-names"></a>
+<a name="N10117"></a><a name="file-names"></a>
 <h2 class="boxed">Summary of File Extensions</h2>
 <div class="section">
 <p>The following table summarizes the names and extensions of the files in Lucene:
@@ -839,10 +843,10 @@ document.write("Last Published: " + docu
 </div>
 
         
-<a name="N101FE"></a><a name="Primitive Types"></a>
+<a name="N10201"></a><a name="Primitive Types"></a>
 <h2 class="boxed">Primitive Types</h2>
 <div class="section">
-<a name="N10203"></a><a name="Byte"></a>
+<a name="N10206"></a><a name="Byte"></a>
 <h3 class="boxed">Byte</h3>
 <p>
                     The most primitive type
@@ -850,7 +854,7 @@ document.write("Last Published: " + docu
                     other data types are defined as sequences
                     of bytes, so file formats are byte-order independent.
                 </p>
-<a name="N1020C"></a><a name="UInt32"></a>
+<a name="N1020F"></a><a name="UInt32"></a>
 <h3 class="boxed">UInt32</h3>
 <p>
                     32-bit unsigned integers are written as four
@@ -860,7 +864,7 @@ document.write("Last Published: " + docu
                     UInt32    --&gt; &lt;Byte&gt;<sup>4</sup>
                 
 </p>
-<a name="N1021B"></a><a name="Uint64"></a>
+<a name="N1021E"></a><a name="Uint64"></a>
 <h3 class="boxed">Uint64</h3>
 <p>
                     64-bit unsigned integers are written as eight
@@ -869,7 +873,7 @@ document.write("Last Published: " + docu
 <p>UInt64    --&gt; &lt;Byte&gt;<sup>8</sup>
                 
 </p>
-<a name="N1022A"></a><a name="VInt"></a>
+<a name="N1022D"></a><a name="VInt"></a>
 <h3 class="boxed">VInt</h3>
 <p>
                     A variable-length format for positive integers is
@@ -1419,13 +1423,13 @@ document.write("Last Published: " + docu
                     This provides compression while still being
                     efficient to decode.
                 </p>
-<a name="N1050F"></a><a name="Chars"></a>
+<a name="N10512"></a><a name="Chars"></a>
 <h3 class="boxed">Chars</h3>
 <p>
                     Lucene writes unicode
                     character sequences as UTF-8 encoded bytes.
                 </p>
-<a name="N10518"></a><a name="String"></a>
+<a name="N1051B"></a><a name="String"></a>
 <h3 class="boxed">String</h3>
 <p>
 		    Lucene writes strings as UTF-8 encoded bytes.
@@ -1438,10 +1442,10 @@ document.write("Last Published: " + docu
 </div>
 
         
-<a name="N10525"></a><a name="Compound Types"></a>
+<a name="N10528"></a><a name="Compound Types"></a>
 <h2 class="boxed">Compound Types</h2>
 <div class="section">
-<a name="N1052A"></a><a name="MapStringString"></a>
+<a name="N1052D"></a><a name="MapStringString"></a>
 <h3 class="boxed">Map&lt;String,String&gt;</h3>
 <p>
 		    In a couple places Lucene stores a Map
@@ -1454,13 +1458,13 @@ document.write("Last Published: " + docu
 </div>
 
         
-<a name="N1053A"></a><a name="Per-Index Files"></a>
+<a name="N1053D"></a><a name="Per-Index Files"></a>
 <h2 class="boxed">Per-Index Files</h2>
 <div class="section">
 <p>
                 The files in this section exist one-per-index.
             </p>
-<a name="N10542"></a><a name="Segments File"></a>
+<a name="N10545"></a><a name="Segments File"></a>
 <h3 class="boxed">Segments File</h3>
 <p>
                     The active segments in the index are stored in the
@@ -1501,8 +1505,8 @@ document.write("Last Published: " + docu
                 </p>
 <p>
                     
-<b>2.9</b>
-                    Segments --&gt; Format, Version, NameCounter, SegCount, &lt;SegName, SegSize, DelGen, DocStoreOffset, [DocStoreSegment, DocStoreIsCompoundFile], HasSingleNormFile, NumField,
+<b>3.1</b>
+                    Segments --&gt; Format, Version, NameCounter, SegCount, &lt;SegVersion, SegName, SegSize, DelGen, DocStoreOffset, [DocStoreSegment, DocStoreIsCompoundFile], HasSingleNormFile, NumField,
                     NormGen<sup>NumField</sup>,
                     IsCompoundFile, DeletionCount, HasProx, Diagnostics&gt;<sup>SegCount</sup>, CommitUserData, Checksum
                 </p>
@@ -1514,7 +1518,7 @@ document.write("Last Published: " + docu
                     Version, DelGen, NormGen, Checksum --&gt; Int64
                 </p>
 <p>
-                   SegName, DocStoreSegment --&gt; String
+                   SegVersion, SegName, DocStoreSegment --&gt; String
                 </p>
 <p>
 		   Diagnostics --&gt; Map&lt;String,String&gt;
@@ -1537,6 +1541,9 @@ document.write("Last Published: " + docu
                     NameCounter is used to generate names for new segment files.
                 </p>
 <p>
+                    SegVersion is the code version that created the segment.
+                </p>
+<p>
                     SegName is the name of the segment, and is used as the file name prefix
                     for all of the files that compose the segment's index.
                 </p>
@@ -1627,7 +1634,7 @@ document.write("Last Published: " + docu
 		    Lucene version, OS, Java version, why the segment
 		    was created (merge, flush, addIndexes), etc.
                 </p>
-<a name="N105C7"></a><a name="Lock File"></a>
+<a name="N105CD"></a><a name="Lock File"></a>
 <h3 class="boxed">Lock File</h3>
 <p>
                     The write lock, which is stored in the index
@@ -1641,14 +1648,14 @@ document.write("Last Published: " + docu
                     documents).  This lock file ensures that only one
                     writer is modifying the index at a time.
                 </p>
-<a name="N105D0"></a><a name="Deletable File"></a>
+<a name="N105D6"></a><a name="Deletable File"></a>
 <h3 class="boxed">Deletable File</h3>
 <p>
                     A writer dynamically computes
                     the files that are deletable, instead, so no file
                     is written.
                 </p>
-<a name="N105D9"></a><a name="Compound Files"></a>
+<a name="N105DF"></a><a name="Compound Files"></a>
 <h3 class="boxed">Compound Files</h3>
 <p>Starting with Lucene 1.4 the compound file format became default. This
                     is simply a container for all files described in the next section
@@ -1675,14 +1682,14 @@ document.write("Last Published: " + docu
 </div>
 
         
-<a name="N10601"></a><a name="Per-Segment Files"></a>
+<a name="N10607"></a><a name="Per-Segment Files"></a>
 <h2 class="boxed">Per-Segment Files</h2>
 <div class="section">
 <p>
                 The remaining files are all per-segment, and are
                 thus defined by suffix.
             </p>
-<a name="N10609"></a><a name="Fields"></a>
+<a name="N1060F"></a><a name="Fields"></a>
 <h3 class="boxed">Fields</h3>
 <p>
                     
@@ -1876,7 +1883,7 @@ document.write("Last Published: " + docu
 </li>
                 
 </ol>
-<a name="N106B0"></a><a name="Term Dictionary"></a>
+<a name="N106B6"></a><a name="Term Dictionary"></a>
 <h3 class="boxed">Term Dictionary</h3>
 <p>
                     The term dictionary is represented as two files:
@@ -2068,7 +2075,7 @@ document.write("Last Published: " + docu
 </li>
                 
 </ol>
-<a name="N10734"></a><a name="Frequencies"></a>
+<a name="N1073A"></a><a name="Frequencies"></a>
 <h3 class="boxed">Frequencies</h3>
 <p>
                     The .frq file contains the lists of documents
@@ -2196,7 +2203,7 @@ document.write("Last Published: " + docu
                    entry in level-1. In the example has entry 15 on level 1 a pointer to entry 15 on level 0 and entry 31 on level 1 a pointer
                    to entry 31 on level 0.                   
                 </p>
-<a name="N107BC"></a><a name="Positions"></a>
+<a name="N107C2"></a><a name="Positions"></a>
 <h3 class="boxed">Positions</h3>
 <p>
                     The .prx file contains the lists of positions that
@@ -2266,7 +2273,7 @@ document.write("Last Published: " + docu
                     Payload. If PayloadLength is not stored, then this Payload has the same
                     length as the Payload at the previous position.
                 </p>
-<a name="N107F8"></a><a name="Normalization Factors"></a>
+<a name="N107FE"></a><a name="Normalization Factors"></a>
 <h3 class="boxed">Normalization Factors</h3>
 <p>There's a single .nrm file containing all norms:
                 </p>
@@ -2346,7 +2353,7 @@ document.write("Last Published: " + docu
                 </p>
 <p>Separate norm files are created (when adequate) for both compound and non compound segments.
                 </p>
-<a name="N10849"></a><a name="Term Vectors"></a>
+<a name="N1084F"></a><a name="Term Vectors"></a>
 <h3 class="boxed">Term Vectors</h3>
 <p>
 		  Term Vector support is an optional on a field by
@@ -2482,7 +2489,7 @@ document.write("Last Published: " + docu
 </li>
                 
 </ol>
-<a name="N108E5"></a><a name="Deleted Documents"></a>
+<a name="N108EB"></a><a name="Deleted Documents"></a>
 <h3 class="boxed">Deleted Documents</h3>
 <p>The .del file is
                     optional, and only exists when a segment contains deletions.
@@ -2546,7 +2553,7 @@ document.write("Last Published: " + docu
 </div>
 
         
-<a name="N1091F"></a><a name="Limitations"></a>
+<a name="N10925"></a><a name="Limitations"></a>
 <h2 class="boxed">Limitations</h2>
 <div class="section">
 <p>