You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/04/19 13:40:26 UTC

[GitHub] [lucene] jpountz commented on a change in pull request #90: LUCENE-9353: revise format documentation of Lucene90BlockTreeTermsWriter

jpountz commented on a change in pull request #90:
URL: https://github.com/apache/lucene/pull/90#discussion_r615854333



##########
File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsWriter.java
##########
@@ -140,24 +135,48 @@
  * <ul>
  *   <li>Header is a {@link CodecUtil#writeHeader CodecHeader} storing the version information for
  *       the BlockTree implementation.
- *   <li>DirOffset is a pointer to the FieldSummary section.
  *   <li>DocFreq is the count of documents which contain the term.
  *   <li>TotalTermFreq is the total number of occurrences of the term. This is encoded as the
  *       difference between the total number of occurrences and the DocFreq.
+ *   <li>PostingsHeader and TermMetadata are plugged into by the specific postings implementation:
+ *       these contain arbitrary per-file data (such as parameters or versioning information) and
+ *       per-term data (such as pointers to inverted files).
+ *   <li>For inner nodes of the tree, every entry will steal one bit to mark whether it points to
+ *       child nodes(sub-block). If so, the corresponding TermStats and TermMetaData are omitted

Review comment:
       Adding a trailing dot for consistency with other items.
   
   ```suggestion
    *       child nodes(sub-block). If so, the corresponding TermStats and TermMetaData are omitted.
   ```

##########
File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsWriter.java
##########
@@ -140,24 +135,48 @@
  * <ul>
  *   <li>Header is a {@link CodecUtil#writeHeader CodecHeader} storing the version information for
  *       the BlockTree implementation.
- *   <li>DirOffset is a pointer to the FieldSummary section.
  *   <li>DocFreq is the count of documents which contain the term.
  *   <li>TotalTermFreq is the total number of occurrences of the term. This is encoded as the
  *       difference between the total number of occurrences and the DocFreq.
+ *   <li>PostingsHeader and TermMetadata are plugged into by the specific postings implementation:
+ *       these contain arbitrary per-file data (such as parameters or versioning information) and
+ *       per-term data (such as pointers to inverted files).
+ *   <li>For inner nodes of the tree, every entry will steal one bit to mark whether it points to
+ *       child nodes(sub-block). If so, the corresponding TermStats and TermMetaData are omitted
+ * </ul>
+ *
+ * <p><a id="Termmetadata"></a>
+ *
+ * <h2>Term Metadata</h2>
+ *
+ * <p>The .tmd file contains the list of term metadata (such as FST index metadata) and field level
+ * statistics (such as sum of total term freq).
+ *
+ * <ul>
+ *   <li>TermsMeta (.tmd) --&gt; Header, NumFields, &lt;FieldStats&gt;<sup>NumFields</sup>,
+ *       TermIndexLength, TermDictLength, Footer
+ *   <li>FieldStats --&gt; FieldNumber, NumTerms, RootCodeLength, Byte<sup>RootCodeLength</sup>,
+ *       SumTotalTermFreq?, SumDocFreq, DocCount, MinTerm, MaxTerm, IndexStartFP, FSTHeader,

Review comment:
       I think it is SumDocFreq which is not always specified rather than SumTotalTermFreq?
   
   ```suggestion
    *       SumTotalTermFreq, SumDocFreq?, DocCount, MinTerm, MaxTerm, IndexStartFP, FSTHeader,
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org