You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@paimon.apache.org by "tsreaper (via GitHub)" <gi...@apache.org> on 2023/11/01 06:35:03 UTC

Re: [PR] [Core] support scan metrics [incubator-paimon]

tsreaper commented on code in PR #2170:
URL: https://github.com/apache/incubator-paimon/pull/2170#discussion_r1378440114


##########
paimon-core/src/main/java/org/apache/paimon/table/system/AuditLogTable.java:
##########
@@ -247,6 +248,12 @@ public SnapshotReader withBucketFilter(Filter<Integer> bucketFilter) {
             return this;
         }
 
+        @Override
+        public SnapshotReader withMetricRegistry(MetricRegistry registry) {
+            // won't register metric

Review Comment:
   Why not? `AuditLogTable` also scans from data file.



##########
paimon-core/src/main/java/org/apache/paimon/operation/AbstractFileStoreScan.java:
##########
@@ -237,19 +249,26 @@ private Pair<Snapshot, List<ManifestEntry>> doPlan(
             }
         }
 
+        long startDataFiles =
+                manifests.stream().mapToLong(f -> f.numAddedFiles() + f.numDeletedFiles()).sum();
+
+        AtomicLong cntEntries = new AtomicLong(0);
         Iterable<ManifestEntry> entries =
                 ParallellyExecuteUtils.parallelismBatchIterable(
                         files ->
                                 files.parallelStream()
                                         .filter(this::filterManifestFileMeta)
                                         .flatMap(m -> readManifest.apply(m).stream())
                                         .filter(this::filterByStats)
+                                        .peek(e -> cntEntries.getAndIncrement())

Review Comment:
   Why not directly increase `cntEntries` by the size of the collected list? If you use `peek`, `cntEntries` will be increased by every element in this list. Note that this stream is parallelized, so there will be many concurrent writes which affect the performance.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org