You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/04/20 14:33:40 UTC

[GitHub] [lucene] mocobeta commented on a change in pull request #90: LUCENE-9353: revise format documentation of Lucene90BlockTreeTermsWriter

mocobeta commented on a change in pull request #90:
URL: https://github.com/apache/lucene/pull/90#discussion_r616743197



##########
File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsWriter.java
##########
@@ -140,24 +135,48 @@
  * <ul>
  *   <li>Header is a {@link CodecUtil#writeHeader CodecHeader} storing the version information for
  *       the BlockTree implementation.
- *   <li>DirOffset is a pointer to the FieldSummary section.
  *   <li>DocFreq is the count of documents which contain the term.
  *   <li>TotalTermFreq is the total number of occurrences of the term. This is encoded as the
  *       difference between the total number of occurrences and the DocFreq.
+ *   <li>PostingsHeader and TermMetadata are plugged into by the specific postings implementation:
+ *       these contain arbitrary per-file data (such as parameters or versioning information) and
+ *       per-term data (such as pointers to inverted files).
+ *   <li>For inner nodes of the tree, every entry will steal one bit to mark whether it points to
+ *       child nodes(sub-block). If so, the corresponding TermStats and TermMetaData are omitted
+ * </ul>
+ *
+ * <p><a id="Termmetadata"></a>
+ *
+ * <h2>Term Metadata</h2>
+ *
+ * <p>The .tmd file contains the list of term metadata (such as FST index metadata) and field level
+ * statistics (such as sum of total term freq).
+ *
+ * <ul>
+ *   <li>TermsMeta (.tmd) --&gt; Header, NumFields, &lt;FieldStats&gt;<sup>NumFields</sup>,
+ *       TermIndexLength, TermDictLength, Footer
+ *   <li>FieldStats --&gt; FieldNumber, NumTerms, RootCodeLength, Byte<sup>RootCodeLength</sup>,
+ *       SumTotalTermFreq?, SumDocFreq, DocCount, MinTerm, MaxTerm, IndexStartFP, FSTHeader,

Review comment:
       I'm actually not the author of the line (I just moved it from the above section to here), but the specification seems to be correct to me.
   https://github.com/apache/lucene/blob/5592d582b856c99df4839172b40733c18c6094e9/lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsWriter.java#L1108-L1111




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org