You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by GitBox <gi...@apache.org> on 2022/07/04 02:17:38 UTC

[GitHub] [orc] wgtmac commented on a diff in pull request #1158: ORC-961: [C++] expose related metrics of the reader

wgtmac commented on code in PR #1158:
URL: https://github.com/apache/orc/pull/1158#discussion_r912578816


##########
c++/src/sargs/SargsApplier.cc:
##########
@@ -173,16 +174,23 @@ namespace orc {
     if (!ret) {
       // reset mNextSkippedRows when the current stripe does not satisfy the PPD
       mNextSkippedRows.clear();
+      if (mMetrics != nullptr) {
+        mMetrics->EvaluatedRowGroupCount.fetch_add(stripeRowGroupCount);
+      }
     }
     return ret;
   }
 
-  bool SargsApplier::evaluateFileStatistics(const proto::Footer& footer) {
+  bool SargsApplier::evaluateFileStatistics(const proto::Footer& footer,
+                                            uint64_t fileRowGroupCount) {
     if (!mHasEvaluatedFileStats) {
       if (footer.statistics_size() == 0) {
         mFileStatsEvalResult = true;
       } else {
         mFileStatsEvalResult = evaluateColumnStatistics(footer.statistics());
+        if (!mFileStatsEvalResult && mMetrics != nullptr) {
+          mMetrics->EvaluatedRowGroupCount.fetch_add(fileRowGroupCount);

Review Comment:
   The unit of input split is stripe. If the current input split does not contain all stripes in the file, it is incorrect to count row groups of the whole file.



##########
c++/src/Reader.cc:
##########
@@ -286,6 +287,10 @@ namespace orc {
           lastStripe = i + 1;
         }
       }
+      if (footer->rowindexstride() > 0) {
+        fileRowGroupCount +=

Review Comment:
   I'd suggest to rename fileRowGroupCount to something like numSplitRowGroups,  numRequestedRowGroups or numRowGroupsInRange. Then refine it to count stripe in range only.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@orc.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org