You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by si...@apache.org on 2011/03/30 11:17:42 UTC
svn commit: r1086876 [4/18] - in /lucene/dev/branches/realtime_search: ./
dev-tools/eclipse/ dev-tools/idea/ dev-tools/idea/.idea/libraries/
dev-tools/idea/lucene/ dev-tools/idea/solr/
dev-tools/idea/solr/contrib/analysis-extras/ dev-tools/idea/solr/co...
Modified: lucene/dev/branches/realtime_search/lucene/docs/fileformats.html
URL: http://svn.apache.org/viewvc/lucene/dev/branches/realtime_search/lucene/docs/fileformats.html?rev=1086876&r1=1086875&r2=1086876&view=diff
==============================================================================
--- lucene/dev/branches/realtime_search/lucene/docs/fileformats.html (original)
+++ lucene/dev/branches/realtime_search/lucene/docs/fileformats.html Wed Mar 30 09:17:25 2011
@@ -129,8 +129,11 @@ document.write("Last Published: " + docu
<div class="menuitem">
<a href="api/core/index.html">Core</a>
</div>
-<div onclick="SwitchMenu('menu_1.1.3.3', 'skin/')" id="menu_1.1.3.3Title" class="menutitle">Contrib</div>
-<div id="menu_1.1.3.3" class="menuitemgroup">
+<div class="menuitem">
+<a href="api/test-framework/index.html">Test Framework</a>
+</div>
+<div onclick="SwitchMenu('menu_1.1.3.4', 'skin/')" id="menu_1.1.3.4Title" class="menutitle">Contrib</div>
+<div id="menu_1.1.3.4" class="menuitemgroup">
<div class="menuitem">
<a href="api/contrib-ant/index.html">Ant</a>
</div>
@@ -419,10 +422,14 @@ document.write("Last Published: " + docu
merge the new segment will write them,
uncompressed). See issue LUCENE-1960 for details.
</p>
+<p>
+ In version 3.1, segments records the code version
+ that created them. See LUCENE-2720 for details.
+ </p>
</div>
-<a name="N10034"></a><a name="Definitions"></a>
+<a name="N10037"></a><a name="Definitions"></a>
<h2 class="boxed">Definitions</h2>
<div class="section">
<p>
@@ -463,7 +470,7 @@ document.write("Last Published: " + docu
strings, the first naming the field, and the second naming text
within the field.
</p>
-<a name="N10054"></a><a name="Inverted Indexing"></a>
+<a name="N10057"></a><a name="Inverted Indexing"></a>
<h3 class="boxed">Inverted Indexing</h3>
<p>
The index stores statistics about terms in order
@@ -473,7 +480,7 @@ document.write("Last Published: " + docu
it. This is the inverse of the natural relationship, in which
documents list terms.
</p>
-<a name="N10060"></a><a name="Types of Fields"></a>
+<a name="N10063"></a><a name="Types of Fields"></a>
<h3 class="boxed">Types of Fields</h3>
<p>
In Lucene, fields may be <i>stored</i>, in which
@@ -487,7 +494,7 @@ document.write("Last Published: " + docu
to be indexed literally.
</p>
<p>See the <a href="api/core/org/apache/lucene/document/Field.html">Field</a> java docs for more information on Fields.</p>
-<a name="N1007D"></a><a name="Segments"></a>
+<a name="N10080"></a><a name="Segments"></a>
<h3 class="boxed">Segments</h3>
<p>
Lucene indexes may be composed of multiple sub-indexes, or
@@ -513,7 +520,7 @@ document.write("Last Published: " + docu
Searches may involve multiple segments and/or multiple indexes, each
index potentially composed of a set of segments.
</p>
-<a name="N1009B"></a><a name="Document Numbers"></a>
+<a name="N1009E"></a><a name="Document Numbers"></a>
<h3 class="boxed">Document Numbers</h3>
<p>
Internally, Lucene refers to documents by an integer <i>document
@@ -568,7 +575,7 @@ document.write("Last Published: " + docu
</div>
-<a name="N100C2"></a><a name="Overview"></a>
+<a name="N100C5"></a><a name="Overview"></a>
<h2 class="boxed">Overview</h2>
<div class="section">
<p>
@@ -667,7 +674,7 @@ document.write("Last Published: " + docu
</div>
-<a name="N10105"></a><a name="File Naming"></a>
+<a name="N10108"></a><a name="File Naming"></a>
<h2 class="boxed">File Naming</h2>
<div class="section">
<p>
@@ -694,7 +701,7 @@ document.write("Last Published: " + docu
</p>
</div>
-<a name="N10114"></a><a name="file-names"></a>
+<a name="N10117"></a><a name="file-names"></a>
<h2 class="boxed">Summary of File Extensions</h2>
<div class="section">
<p>The following table summarizes the names and extensions of the files in Lucene:
@@ -836,10 +843,10 @@ document.write("Last Published: " + docu
</div>
-<a name="N101FE"></a><a name="Primitive Types"></a>
+<a name="N10201"></a><a name="Primitive Types"></a>
<h2 class="boxed">Primitive Types</h2>
<div class="section">
-<a name="N10203"></a><a name="Byte"></a>
+<a name="N10206"></a><a name="Byte"></a>
<h3 class="boxed">Byte</h3>
<p>
The most primitive type
@@ -847,7 +854,7 @@ document.write("Last Published: " + docu
other data types are defined as sequences
of bytes, so file formats are byte-order independent.
</p>
-<a name="N1020C"></a><a name="UInt32"></a>
+<a name="N1020F"></a><a name="UInt32"></a>
<h3 class="boxed">UInt32</h3>
<p>
32-bit unsigned integers are written as four
@@ -857,7 +864,7 @@ document.write("Last Published: " + docu
UInt32 --> <Byte><sup>4</sup>
</p>
-<a name="N1021B"></a><a name="Uint64"></a>
+<a name="N1021E"></a><a name="Uint64"></a>
<h3 class="boxed">Uint64</h3>
<p>
64-bit unsigned integers are written as eight
@@ -866,7 +873,7 @@ document.write("Last Published: " + docu
<p>UInt64 --> <Byte><sup>8</sup>
</p>
-<a name="N1022A"></a><a name="VInt"></a>
+<a name="N1022D"></a><a name="VInt"></a>
<h3 class="boxed">VInt</h3>
<p>
A variable-length format for positive integers is
@@ -1416,13 +1423,13 @@ document.write("Last Published: " + docu
This provides compression while still being
efficient to decode.
</p>
-<a name="N1050F"></a><a name="Chars"></a>
+<a name="N10512"></a><a name="Chars"></a>
<h3 class="boxed">Chars</h3>
<p>
Lucene writes unicode
character sequences as UTF-8 encoded bytes.
</p>
-<a name="N10518"></a><a name="String"></a>
+<a name="N1051B"></a><a name="String"></a>
<h3 class="boxed">String</h3>
<p>
Lucene writes strings as UTF-8 encoded bytes.
@@ -1435,10 +1442,10 @@ document.write("Last Published: " + docu
</div>
-<a name="N10525"></a><a name="Compound Types"></a>
+<a name="N10528"></a><a name="Compound Types"></a>
<h2 class="boxed">Compound Types</h2>
<div class="section">
-<a name="N1052A"></a><a name="MapStringString"></a>
+<a name="N1052D"></a><a name="MapStringString"></a>
<h3 class="boxed">Map<String,String></h3>
<p>
In a couple places Lucene stores a Map
@@ -1451,13 +1458,13 @@ document.write("Last Published: " + docu
</div>
-<a name="N1053A"></a><a name="Per-Index Files"></a>
+<a name="N1053D"></a><a name="Per-Index Files"></a>
<h2 class="boxed">Per-Index Files</h2>
<div class="section">
<p>
The files in this section exist one-per-index.
</p>
-<a name="N10542"></a><a name="Segments File"></a>
+<a name="N10545"></a><a name="Segments File"></a>
<h3 class="boxed">Segments File</h3>
<p>
The active segments in the index are stored in the
@@ -1498,8 +1505,8 @@ document.write("Last Published: " + docu
</p>
<p>
-<b>2.9</b>
- Segments --> Format, Version, NameCounter, SegCount, <SegName, SegSize, DelGen, DocStoreOffset, [DocStoreSegment, DocStoreIsCompoundFile], HasSingleNormFile, NumField,
+<b>3.1</b>
+ Segments --> Format, Version, NameCounter, SegCount, <SegVersion, SegName, SegSize, DelGen, DocStoreOffset, [DocStoreSegment, DocStoreIsCompoundFile], HasSingleNormFile, NumField,
NormGen<sup>NumField</sup>,
IsCompoundFile, DeletionCount, HasProx, Diagnostics><sup>SegCount</sup>, CommitUserData, Checksum
</p>
@@ -1511,7 +1518,7 @@ document.write("Last Published: " + docu
Version, DelGen, NormGen, Checksum --> Int64
</p>
<p>
- SegName, DocStoreSegment --> String
+ SegVersion, SegName, DocStoreSegment --> String
</p>
<p>
Diagnostics --> Map<String,String>
@@ -1534,6 +1541,9 @@ document.write("Last Published: " + docu
NameCounter is used to generate names for new segment files.
</p>
<p>
+ SegVersion is the code version that created the segment.
+ </p>
+<p>
SegName is the name of the segment, and is used as the file name prefix
for all of the files that compose the segment's index.
</p>
@@ -1624,7 +1634,7 @@ document.write("Last Published: " + docu
Lucene version, OS, Java version, why the segment
was created (merge, flush, addIndexes), etc.
</p>
-<a name="N105C7"></a><a name="Lock File"></a>
+<a name="N105CD"></a><a name="Lock File"></a>
<h3 class="boxed">Lock File</h3>
<p>
The write lock, which is stored in the index
@@ -1638,14 +1648,14 @@ document.write("Last Published: " + docu
documents). This lock file ensures that only one
writer is modifying the index at a time.
</p>
-<a name="N105D0"></a><a name="Deletable File"></a>
+<a name="N105D6"></a><a name="Deletable File"></a>
<h3 class="boxed">Deletable File</h3>
<p>
A writer dynamically computes
the files that are deletable, instead, so no file
is written.
</p>
-<a name="N105D9"></a><a name="Compound Files"></a>
+<a name="N105DF"></a><a name="Compound Files"></a>
<h3 class="boxed">Compound Files</h3>
<p>Starting with Lucene 1.4 the compound file format became default. This
is simply a container for all files described in the next section
@@ -1672,14 +1682,14 @@ document.write("Last Published: " + docu
</div>
-<a name="N10601"></a><a name="Per-Segment Files"></a>
+<a name="N10607"></a><a name="Per-Segment Files"></a>
<h2 class="boxed">Per-Segment Files</h2>
<div class="section">
<p>
The remaining files are all per-segment, and are
thus defined by suffix.
</p>
-<a name="N10609"></a><a name="Fields"></a>
+<a name="N1060F"></a><a name="Fields"></a>
<h3 class="boxed">Fields</h3>
<p>
@@ -1873,7 +1883,7 @@ document.write("Last Published: " + docu
</li>
</ol>
-<a name="N106B0"></a><a name="Term Dictionary"></a>
+<a name="N106B6"></a><a name="Term Dictionary"></a>
<h3 class="boxed">Term Dictionary</h3>
<p>
The term dictionary is represented as two files:
@@ -2065,7 +2075,7 @@ document.write("Last Published: " + docu
</li>
</ol>
-<a name="N10734"></a><a name="Frequencies"></a>
+<a name="N1073A"></a><a name="Frequencies"></a>
<h3 class="boxed">Frequencies</h3>
<p>
The .frq file contains the lists of documents
@@ -2193,7 +2203,7 @@ document.write("Last Published: " + docu
entry in level-1. In the example has entry 15 on level 1 a pointer to entry 15 on level 0 and entry 31 on level 1 a pointer
to entry 31 on level 0.
</p>
-<a name="N107BC"></a><a name="Positions"></a>
+<a name="N107C2"></a><a name="Positions"></a>
<h3 class="boxed">Positions</h3>
<p>
The .prx file contains the lists of positions that
@@ -2263,7 +2273,7 @@ document.write("Last Published: " + docu
Payload. If PayloadLength is not stored, then this Payload has the same
length as the Payload at the previous position.
</p>
-<a name="N107F8"></a><a name="Normalization Factors"></a>
+<a name="N107FE"></a><a name="Normalization Factors"></a>
<h3 class="boxed">Normalization Factors</h3>
<p>There's a single .nrm file containing all norms:
</p>
@@ -2343,7 +2353,7 @@ document.write("Last Published: " + docu
</p>
<p>Separate norm files are created (when adequate) for both compound and non compound segments.
</p>
-<a name="N10849"></a><a name="Term Vectors"></a>
+<a name="N1084F"></a><a name="Term Vectors"></a>
<h3 class="boxed">Term Vectors</h3>
<p>
Term Vector support is an optional on a field by
@@ -2479,7 +2489,7 @@ document.write("Last Published: " + docu
</li>
</ol>
-<a name="N108E5"></a><a name="Deleted Documents"></a>
+<a name="N108EB"></a><a name="Deleted Documents"></a>
<h3 class="boxed">Deleted Documents</h3>
<p>The .del file is
optional, and only exists when a segment contains deletions.
@@ -2543,7 +2553,7 @@ document.write("Last Published: " + docu
</div>
-<a name="N1091F"></a><a name="Limitations"></a>
+<a name="N10925"></a><a name="Limitations"></a>
<h2 class="boxed">Limitations</h2>
<div class="section">
<p>