You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "karuppayya (via GitHub)" <gi...@apache.org> on 2023/04/27 13:43:22 UTC

[GitHub] [iceberg] karuppayya opened a new pull request, #7447: Display Spark read metrics on Spark SQL UI

karuppayya opened a new pull request, #7447:
URL: https://github.com/apache/iceberg/pull/7447

   ![Screen Shot 2023-04-27 at 6 42 38 AM](https://user-images.githubusercontent.com/5082742/234880667-ba6189d3-2ffd-43b5-a8a3-15bc240b0ce5.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1254699323


##########
core/src/main/java/org/apache/iceberg/metrics/InMemoryReadMetricReporter.java:
##########
@@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.metrics;
+
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+
+public class InMemoryReadMetricReporter implements MetricsReporter {
+
+  private MetricsReport metricsReport;
+
+  @Override
+  public void report(MetricsReport report) {
+    this.metricsReport = (ScanReport) report;

Review Comment:
   I think this cast needs to be removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1254706408


##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReadMetrics.java:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source;
+
+import static scala.collection.JavaConverters.seqAsJavaListConverter;
+
+import java.util.List;
+import java.util.Map;
+import org.apache.iceberg.spark.SparkTestBaseWithCatalog;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
+import org.apache.spark.sql.execution.SparkPlan;
+import org.apache.spark.sql.execution.metric.SQLMetric;
+import org.junit.After;
+import org.junit.Test;
+import org.junit.jupiter.api.Assertions;
+import scala.collection.JavaConverters;
+
+public class TestSparkReadMetrics extends SparkTestBaseWithCatalog {
+
+  @After
+  public void removeTables() {
+    sql("DROP TABLE IF EXISTS %s", tableName);
+  }
+
+  @Test
+  public void testReadMetrics() throws NoSuchTableException {
+    sql("CREATE TABLE %s (id BIGINT) USING iceberg", tableName);
+
+    spark.range(10000).coalesce(1).writeTo(tableName).append();
+
+    Dataset<Row> df = spark.sql(String.format("select * from %s", tableName));
+    df.collect();
+
+    List<SparkPlan> sparkPlans =
+        seqAsJavaListConverter(df.queryExecution().executedPlan().collectLeaves()).asJava();
+    Map<String, SQLMetric> metricsMap =
+        JavaConverters.mapAsJavaMapConverter(sparkPlans.get(0).metrics()).asJava();
+    Assertions.assertEquals(1, metricsMap.get("scannedDataManifests").value());

Review Comment:
   also would it make sense to add some more metric checks here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1179971585


##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReadMetrics.java:
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source;
+
+import static scala.collection.JavaConverters.seqAsJavaListConverter;
+
+import java.util.List;
+import org.apache.iceberg.spark.SparkTestBaseWithCatalog;
+import org.apache.iceberg.spark.source.metrics.TotalFileSize;
+import org.apache.iceberg.spark.source.metrics.TotalPlanningDuration;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
+import org.apache.spark.sql.execution.SparkPlan;
+import org.apache.spark.sql.execution.metric.SQLMetric;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Test;
+import scala.collection.immutable.Map;
+
+public class TestSparkReadMetrics extends SparkTestBaseWithCatalog {
+
+  @After
+  public void removeTables() {
+    sql("DROP TABLE IF EXISTS %s", tableName);
+  }
+
+  @Test
+  public void testReadMetrics() throws NoSuchTableException {
+    sql("CREATE TABLE %s (id BIGINT) USING iceberg", tableName);
+
+    spark.range(10000).coalesce(1).writeTo(tableName).append();
+
+    Dataset<Row> df = spark.sql(String.format("select * from %s", tableName));
+    df.collect();
+
+    List<SparkPlan> sparkPlans =
+        seqAsJavaListConverter(df.queryExecution().executedPlan().collectLeaves()).asJava();
+    Map<String, SQLMetric> metrics = sparkPlans.get(0).metrics();
+
+    Assert.assertTrue(metrics.contains(TotalFileSize.name));

Review Comment:
   as that will show the content of the map in case the assertion ever fails



##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReadMetrics.java:
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source;
+
+import static scala.collection.JavaConverters.seqAsJavaListConverter;
+
+import java.util.List;
+import org.apache.iceberg.spark.SparkTestBaseWithCatalog;
+import org.apache.iceberg.spark.source.metrics.TotalFileSize;
+import org.apache.iceberg.spark.source.metrics.TotalPlanningDuration;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
+import org.apache.spark.sql.execution.SparkPlan;
+import org.apache.spark.sql.execution.metric.SQLMetric;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Test;
+import scala.collection.immutable.Map;
+
+public class TestSparkReadMetrics extends SparkTestBaseWithCatalog {
+
+  @After
+  public void removeTables() {
+    sql("DROP TABLE IF EXISTS %s", tableName);
+  }
+
+  @Test
+  public void testReadMetrics() throws NoSuchTableException {
+    sql("CREATE TABLE %s (id BIGINT) USING iceberg", tableName);
+
+    spark.range(10000).coalesce(1).writeTo(tableName).append();
+
+    Dataset<Row> df = spark.sql(String.format("select * from %s", tableName));
+    df.collect();
+
+    List<SparkPlan> sparkPlans =
+        seqAsJavaListConverter(df.queryExecution().executedPlan().collectLeaves()).asJava();
+    Map<String, SQLMetric> metrics = sparkPlans.get(0).metrics();
+
+    Assert.assertTrue(metrics.contains(TotalFileSize.name));

Review Comment:
   ```suggestion
       Assertions.assertThat(metrics).contains(TotalFileSize.name);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1254703786


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TaskScannedDataManifests.java:
##########
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import org.apache.iceberg.metrics.CounterResult;
+import org.apache.iceberg.metrics.ScanReport;
+import org.apache.spark.sql.connector.metric.CustomTaskMetric;
+
+public class TaskScannedDataManifests implements CustomTaskMetric {
+  private final long value;
+
+  public TaskScannedDataManifests(long value) {
+    this.value = value;
+  }
+
+  @Override
+  public String name() {
+    return ScannedDataManifests.NAME;
+  }
+
+  @Override
+  public long value() {
+    return value;
+  }
+
+  public static TaskScannedDataManifests from(ScanReport scanReport) {
+    CounterResult counter = scanReport.scanMetrics().scannedDataManifests();
+    long value = counter != null ? counter.value() : -1;

Review Comment:
   is `-1` typical for Spark metrics to indicate that no data was available?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1262866936


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/ResultDataFiles.java:
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import org.apache.spark.sql.connector.metric.CustomSumMetric;
+
+public class ResultDataFiles extends CustomSumMetric {
+
+  static final String NAME = "resultDataFiles";
+
+  @Override
+  public String name() {
+    return NAME;
+  }
+
+  @Override
+  public String description() {
+    return "result data files";

Review Comment:
   Would it be easier to understand if we show `number of scanned data files` in the UI?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] puchengy commented on pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "puchengy (via GitHub)" <gi...@apache.org>.
puchengy commented on PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#issuecomment-1645966041

   @karuppayya Thanks for sharing! It seems to much work to first to backport spark changes to spark 3.2, make a release, and backport this change back to spark 3.2 iceberg repo.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1180427666


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java:
##########
@@ -256,4 +268,23 @@ public String toString() {
         runtimeFilterExpressions,
         caseSensitive());
   }
+
+  @Override
+  public CustomTaskMetric[] reportDriverMetrics() {
+    List<CustomTaskMetric> customTaskMetrics = Lists.newArrayList();
+    MetricsReport metricsReport = sparkReadMetricReporter.getMetricsReport();
+    ScanReport scanReport = (ScanReport) metricsReport;

Review Comment:
   rather than casting here I think what would be better is to provide the necessary type of metrics report in the `SparkReadMetricsReporter`? This is because `report(..)` doesn't guarantee that you will only get a `ScanReport`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1254708505


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java:
##########
@@ -170,8 +189,35 @@ public String description() {
         table(), branch(), Spark3Util.describe(filterExpressions), groupingKeyFieldNamesAsString);
   }
 
+  @Override
+  public CustomTaskMetric[] reportDriverMetrics() {
+    ScanReport scanReport = metricsReportSupplier != null ? metricsReportSupplier.get() : null;
+    if (scanReport == null) {

Review Comment:
   doesn't this behavior here contract with 
   ```
   public ScanReport scanReport() {
       Preconditions.checkArgument(
           metricsReport instanceof ScanReport, "Metric report is not a scan report");
       return (ScanReport) metricsReport;
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1248258767


##########
core/src/main/java/org/apache/iceberg/metrics/InMemoryReadMetricReporter.java:
##########
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.metrics;
+
+public class InMemoryReadMetricReporter implements MetricsReporter {

Review Comment:
   It is a bit hard to envision the proper design for this class but what if we make it a bit more generic and move the casting logic into the method that would access it?
   
   ```
   public class InMemoryMetricsReporter implements MetricsReporter {
   
     private MetricsReport metricsReport;
   
     @Override
     public void report(MetricsReport report) {
       this.metricsReport = report;
     }
   
     public ScanReport scanReport() {
       // TODO: maybe add a precondition with a good message?
       return (ScanReport) metricsReport;
     }
   }
   ```



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java:
##########
@@ -73,9 +75,9 @@ class SparkBatchQueryScan extends SparkPartitioningAwareScan<PartitionScanTask>
       Scan<?, ? extends ScanTask, ? extends ScanTaskGroup<?>> scan,
       SparkReadConf readConf,
       Schema expectedSchema,
-      List<Expression> filters) {
-
-    super(spark, table, scan, readConf, expectedSchema, filters);
+      List<Expression> filters,
+      Supplier<ScanReport> metricsReportSupplier) {

Review Comment:
   What about calling git `scanReportSupplier` to be a bit specific?



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkStagedScanBuilder.java:
##########
@@ -39,6 +40,7 @@ class SparkStagedScanBuilder implements ScanBuilder {
 
   @Override
   public Scan build() {
-    return new SparkStagedScan(spark, table, readConf);
+    return new SparkStagedScan(
+        spark, table, readConf, (new InMemoryReadMetricReporter())::scanReport);

Review Comment:
   This change would not be needed if we pass `null` to parent constructor in `SparkStagedScan`.



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TotalFileSize.java:
##########
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import java.util.Arrays;
+import org.apache.spark.sql.connector.metric.CustomMetric;
+
+public class TotalFileSize implements CustomMetric {

Review Comment:
   I am not sure whether I already asked, can we extend `CustomSumMetric` and rely on built-in method for aggregating the result? It applies to all our `CustomMetric` implementations.



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -96,6 +97,7 @@ public class SparkScanBuilder
   private boolean caseSensitive;
   private List<Expression> filterExpressions = null;
   private Filter[] pushedFilters = NO_FILTERS;
+  private final InMemoryReadMetricReporter metricsReporter;

Review Comment:
   Minor: This should go to the block with final variables above, you may init it in in the definition (up to you).
   
   ```
   private final SparkReadConf readConf;
   private final List<String> metaColumns = Lists.newArrayList();
   private final InMemoryMetricsReporter metricsReporter = new InMemoryMetricsReporter();
   ```



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java:
##########
@@ -73,9 +75,9 @@ class SparkBatchQueryScan extends SparkPartitioningAwareScan<PartitionScanTask>
       Scan<?, ? extends ScanTask, ? extends ScanTaskGroup<?>> scan,
       SparkReadConf readConf,
       Schema expectedSchema,
-      List<Expression> filters) {
-
-    super(spark, table, scan, readConf, expectedSchema, filters);
+      List<Expression> filters,
+      Supplier<ScanReport> metricsReportSupplier) {

Review Comment:
   It is actually called `scanReportSupplier` in other places, let's make it consistent.



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java:
##########
@@ -170,8 +189,35 @@ public String description() {
         table(), branch(), Spark3Util.describe(filterExpressions), groupingKeyFieldNamesAsString);
   }
 
+  @Override
+  public CustomTaskMetric[] reportDriverMetrics() {
+    ScanReport scanReport = metricsReportSupplier != null ? metricsReportSupplier.get() : null;
+    if (scanReport == null) {
+      return new CustomTaskMetric[0];
+    }
+
+    List<CustomTaskMetric> driverMetrics = Lists.newArrayList();
+    driverMetrics.add(TaskTotalFileSize.from(scanReport));

Review Comment:
   Is there a reason why we don't include all metrics from `ScanMetricsResult`?



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkStagedScan.java:
##########
@@ -39,8 +41,12 @@ class SparkStagedScan extends SparkScan {
 
   private List<ScanTaskGroup<ScanTask>> taskGroups = null; // lazy cache of tasks
 
-  SparkStagedScan(SparkSession spark, Table table, SparkReadConf readConf) {
-    super(spark, table, readConf, table.schema(), ImmutableList.of());
+  SparkStagedScan(
+      SparkSession spark,
+      Table table,
+      SparkReadConf readConf,
+      Supplier<ScanReport> scanReportSupplier) {
+    super(spark, table, readConf, table.schema(), ImmutableList.of(), scanReportSupplier);

Review Comment:
   Shall we simply pass `null` here and remove the supplier from the `SparkStagedScan` constructor? We know there would be no metrics report available as it is a staged scan.



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java:
##########
@@ -67,7 +84,9 @@ abstract class SparkScan implements Scan, SupportsReportStatistics {
       Table table,
       SparkReadConf readConf,
       Schema expectedSchema,
-      List<Expression> filters) {
+      List<Expression> filters,
+      Supplier<ScanReport> metricsReportSupplier) {
+    this.metricsReportSupplier = metricsReportSupplier;

Review Comment:
   Can we add this assignment as the last line in the constructors to follow the existing style?
   Also, let's call it `scanReportSupplier`.



##########
core/src/main/java/org/apache/iceberg/metrics/InMemoryReadMetricReporter.java:
##########
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.metrics;
+
+public class InMemoryReadMetricReporter implements MetricsReporter {

Review Comment:
   In the future, we may use it in other places and add more accessor methods.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1248287237


##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReadMetrics.java:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source;
+
+import static scala.collection.JavaConverters.seqAsJavaListConverter;
+
+import java.util.List;
+import java.util.Map;
+import org.apache.iceberg.spark.SparkTestBaseWithCatalog;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
+import org.apache.spark.sql.execution.SparkPlan;
+import org.apache.spark.sql.execution.metric.SQLMetric;
+import org.junit.After;
+import org.junit.Test;
+import org.junit.jupiter.api.Assertions;
+import scala.collection.JavaConverters;
+
+public class TestSparkReadMetrics extends SparkTestBaseWithCatalog {
+
+  @After
+  public void removeTables() {
+    sql("DROP TABLE IF EXISTS %s", tableName);
+  }
+
+  @Test
+  public void testReadMetrics() throws NoSuchTableException {

Review Comment:
   Can we add proper tests? I think the approach is correct, so we can add tests now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1262778894


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkPartitioningAwareScan.java:
##########
@@ -74,9 +76,10 @@ abstract class SparkPartitioningAwareScan<T extends PartitionScanTask> extends S
       Scan<?, ? extends ScanTask, ? extends ScanTaskGroup<?>> scan,
       SparkReadConf readConf,
       Schema expectedSchema,
-      List<Expression> filters) {
+      List<Expression> filters,
+      Supplier<ScanReport> metricsReportSupplier) {

Review Comment:
   Minor: `metricsReportSupplier` -> `scanReportSupplier`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1262769634


##########
core/src/main/java/org/apache/iceberg/metrics/InMemoryReadMetricReporter.java:
##########
@@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.metrics;
+
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+
+public class InMemoryReadMetricReporter implements MetricsReporter {
+
+  private MetricsReport metricsReport;
+
+  @Override
+  public void report(MetricsReport report) {
+    this.metricsReport = (ScanReport) report;

Review Comment:
   +1, this seems to be a generic container allowing to intercept a report.
   
   I'd also call it `InMemoryMetricsReporter`, there is nothing specific to reads right now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#issuecomment-1644765814

   It is awesome to get this done, @karuppayya! Thanks for reviewing, @nastra!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1265382952


##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReadMetrics.java:
##########
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source;
+
+import static scala.collection.JavaConverters.seqAsJavaListConverter;
+
+import java.util.List;
+import java.util.Map;
+import org.apache.iceberg.spark.SparkTestBaseWithCatalog;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
+import org.apache.spark.sql.execution.SparkPlan;
+import org.apache.spark.sql.execution.metric.SQLMetric;
+import org.assertj.core.api.Assertions;
+import org.junit.After;
+import org.junit.Test;
+import scala.collection.JavaConverters;
+
+public class TestSparkReadMetrics extends SparkTestBaseWithCatalog {
+
+  @After
+  public void removeTables() {
+    sql("DROP TABLE IF EXISTS %s", tableName);
+  }
+
+  @Test
+  public void testReadMetricsForV1Table() throws NoSuchTableException {
+    sql(
+        "CREATE TABLE %s (id BIGINT) USING iceberg TBLPROPERTIES ('format-version'='2')",
+        tableName);
+
+    spark.range(10000).coalesce(1).writeTo(tableName).append();
+    spark.range(10001, 20000).coalesce(1).writeTo(tableName).append();
+
+    Dataset<Row> df = spark.sql(String.format("select * from %s where id < 10000", tableName));
+    df.collect();
+
+    List<SparkPlan> sparkPlans =
+        seqAsJavaListConverter(df.queryExecution().executedPlan().collectLeaves()).asJava();
+    Map<String, SQLMetric> metricsMap =
+        JavaConverters.mapAsJavaMapConverter(sparkPlans.get(0).metrics()).asJava();
+    Assertions.assertThat(metricsMap.get("skippedDataFiles").value()).isEqualTo(1);
+    Assertions.assertThat(metricsMap.get("scannedDataManifests").value()).isEqualTo(2);
+    Assertions.assertThat(metricsMap.get("scannedDataFiles").value()).isEqualTo(1);
+  }
+
+  @Test
+  public void testReadMetricsForV2Table() throws NoSuchTableException {
+    sql(
+        "CREATE TABLE %s (id BIGINT) USING iceberg TBLPROPERTIES ('format-version'='2')",
+        tableName);
+
+    spark.range(10000).coalesce(1).writeTo(tableName).append();
+    spark.range(10001, 20000).coalesce(1).writeTo(tableName).append();
+
+    Dataset<Row> df = spark.sql(String.format("select * from %s where id < 10000", tableName));
+    df.collect();
+
+    List<SparkPlan> sparkPlans =
+        seqAsJavaListConverter(df.queryExecution().executedPlan().collectLeaves()).asJava();
+    Map<String, SQLMetric> metricsMap =
+        JavaConverters.mapAsJavaMapConverter(sparkPlans.get(0).metrics()).asJava();
+    Assertions.assertThat(metricsMap.get("skippedDataFiles").value()).isEqualTo(1);
+    Assertions.assertThat(metricsMap.get("scannedDataManifests").value()).isEqualTo(2);
+    Assertions.assertThat(metricsMap.get("scannedDataFiles").value()).isEqualTo(1);

Review Comment:
   same as above



##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReadMetrics.java:
##########
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source;
+
+import static scala.collection.JavaConverters.seqAsJavaListConverter;
+
+import java.util.List;
+import java.util.Map;
+import org.apache.iceberg.spark.SparkTestBaseWithCatalog;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
+import org.apache.spark.sql.execution.SparkPlan;
+import org.apache.spark.sql.execution.metric.SQLMetric;
+import org.assertj.core.api.Assertions;
+import org.junit.After;
+import org.junit.Test;
+import scala.collection.JavaConverters;
+
+public class TestSparkReadMetrics extends SparkTestBaseWithCatalog {
+
+  @After
+  public void removeTables() {
+    sql("DROP TABLE IF EXISTS %s", tableName);
+  }
+
+  @Test
+  public void testReadMetricsForV1Table() throws NoSuchTableException {
+    sql(
+        "CREATE TABLE %s (id BIGINT) USING iceberg TBLPROPERTIES ('format-version'='2')",
+        tableName);
+
+    spark.range(10000).coalesce(1).writeTo(tableName).append();
+    spark.range(10001, 20000).coalesce(1).writeTo(tableName).append();
+
+    Dataset<Row> df = spark.sql(String.format("select * from %s where id < 10000", tableName));
+    df.collect();
+
+    List<SparkPlan> sparkPlans =
+        seqAsJavaListConverter(df.queryExecution().executedPlan().collectLeaves()).asJava();
+    Map<String, SQLMetric> metricsMap =
+        JavaConverters.mapAsJavaMapConverter(sparkPlans.get(0).metrics()).asJava();
+    Assertions.assertThat(metricsMap.get("skippedDataFiles").value()).isEqualTo(1);
+    Assertions.assertThat(metricsMap.get("scannedDataManifests").value()).isEqualTo(2);
+    Assertions.assertThat(metricsMap.get("scannedDataFiles").value()).isEqualTo(1);

Review Comment:
   can you add checks for all the other metrics here as well (even if they are 0)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi merged pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi merged PR #7447:
URL: https://github.com/apache/iceberg/pull/7447


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] frankliee commented on pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "frankliee (via GitHub)" <gi...@apache.org>.
frankliee commented on PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#issuecomment-1650900433

   Thanks for this pr. Could I backport this to Spark 3.3. @karuppayya


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1187476500


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TotalPlanningDuration.java:
##########
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import java.util.Arrays;
+import org.apache.spark.sql.connector.metric.CustomMetric;
+
+public class TotalPlanningDuration implements CustomMetric {
+
+  static final String name = "planningDuration";

Review Comment:
   should this be `totalPlanningDuration` instead? Same in the description



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/SparkReadMetricReporter.java:
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import java.util.Optional;
+import org.apache.iceberg.metrics.MetricsReport;
+import org.apache.iceberg.metrics.MetricsReporter;
+import org.apache.iceberg.metrics.ScanReport;
+
+public class SparkReadMetricReporter implements MetricsReporter {
+
+  private MetricsReport metricsReport;
+
+  @Override
+  public void report(MetricsReport report) {
+    this.metricsReport = report;
+  }
+
+  public Optional<ScanReport> getScanReport() {
+    return Optional.ofNullable((ScanReport) metricsReport);

Review Comment:
   ```suggestion
       return metricsReport instanceof ScanReport
           ? Optional.of((ScanReport) metricsReport)
           : Optional.empty();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1242358062


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/InMemoryReadMetricReporter.java:
##########
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import java.util.function.Supplier;
+import org.apache.iceberg.metrics.MetricsReport;
+import org.apache.iceberg.metrics.MetricsReporter;
+import org.apache.iceberg.metrics.ScanReport;
+
+public class InMemoryReadMetricReporter implements MetricsReporter, Supplier<ScanReport> {
+
+  private MetricsReport metricsReport;
+
+  @Override
+  public void report(MetricsReport report) {
+    this.metricsReport = report;
+  }
+
+  @Override
+  public ScanReport get() {
+    return (ScanReport) metricsReport;

Review Comment:
   there's no guarantee that this will always return a `ScanReport`. There are also other types of reports



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#issuecomment-1643497921

   This LGTM once CI passes, @karuppayya could you rebase onto latest master please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1254704342


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TaskScannedDataManifests.java:
##########
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import org.apache.iceberg.metrics.CounterResult;
+import org.apache.iceberg.metrics.ScanReport;
+import org.apache.spark.sql.connector.metric.CustomTaskMetric;
+
+public class TaskScannedDataManifests implements CustomTaskMetric {
+  private final long value;
+
+  public TaskScannedDataManifests(long value) {

Review Comment:
   should be private because there's the `from(ScanReport)` method I would have assumed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1262807409


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -96,6 +97,7 @@ public class SparkScanBuilder
   private boolean caseSensitive;
   private List<Expression> filterExpressions = null;
   private Filter[] pushedFilters = NO_FILTERS;
+  private final InMemoryReadMetricReporter metricsReporter;

Review Comment:
   I think this still applies.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1262867834


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/ScannedDataManifests.java:
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import org.apache.spark.sql.connector.metric.CustomSumMetric;
+
+public class ScannedDataManifests extends CustomSumMetric {
+
+  static final String NAME = "scannedDataManifests";
+
+  @Override
+  public String name() {
+    return NAME;
+  }
+
+  @Override
+  public String description() {
+    return "num scanned data manifests";

Review Comment:
   Is Spark using `number of ...` for output records? If so, can we match whatever Spark does?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1263097566


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java:
##########
@@ -170,8 +189,35 @@ public String description() {
         table(), branch(), Spark3Util.describe(filterExpressions), groupingKeyFieldNamesAsString);
   }
 
+  @Override
+  public CustomTaskMetric[] reportDriverMetrics() {
+    ScanReport scanReport = metricsReportSupplier != null ? metricsReportSupplier.get() : null;
+    if (scanReport == null) {
+      return new CustomTaskMetric[0];
+    }
+
+    List<CustomTaskMetric> driverMetrics = Lists.newArrayList();
+    driverMetrics.add(TaskTotalFileSize.from(scanReport));

Review Comment:
   Can be done in a follow-up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1262778323


##########
core/src/main/java/org/apache/iceberg/metrics/InMemoryReadMetricReporter.java:
##########
@@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.metrics;
+
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+
+public class InMemoryReadMetricReporter implements MetricsReporter {
+
+  private MetricsReport metricsReport;
+
+  @Override
+  public void report(MetricsReport report) {
+    this.metricsReport = (ScanReport) report;

Review Comment:
   +1 to both suggestions



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1262779959


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java:
##########
@@ -67,7 +84,8 @@ abstract class SparkScan implements Scan, SupportsReportStatistics {
       Table table,
       SparkReadConf readConf,
       Schema expectedSchema,
-      List<Expression> filters) {
+      List<Expression> filters,
+      Supplier<ScanReport> metricsReportSupplier) {

Review Comment:
   Minor: `metricsReportSupplier` -> `scanReportSupplier`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1264380556


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TaskTotalPlanningDuration.java:
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import org.apache.iceberg.metrics.ScanReport;
+import org.apache.iceberg.metrics.TimerResult;
+import org.apache.spark.sql.connector.metric.CustomTaskMetric;
+
+public class TaskTotalPlanningDuration implements CustomTaskMetric {
+
+  private final long value;
+
+  private TaskTotalPlanningDuration(long value) {
+    this.value = value;
+  }
+
+  @Override
+  public String name() {
+    return TotalPlanningDuration.NAME;
+  }
+
+  @Override
+  public long value() {
+    return value;
+  }
+
+  public static TaskTotalPlanningDuration from(ScanReport scanReport) {
+    TimerResult timerResult = scanReport.scanMetrics().totalPlanningDuration();
+    long value = timerResult != null ? timerResult.count() : -1;

Review Comment:
   I think it would be helpful to add a comment why this particular default value was chosen - with a reference to the spark default (here and at the other place)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1262875232


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TaskScannedDataManifests.java:
##########
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import org.apache.iceberg.metrics.CounterResult;
+import org.apache.iceberg.metrics.ScanReport;
+import org.apache.spark.sql.connector.metric.CustomTaskMetric;
+
+public class TaskScannedDataManifests implements CustomTaskMetric {
+  private final long value;
+
+  public TaskScannedDataManifests(long value) {
+    this.value = value;
+  }
+
+  @Override
+  public String name() {
+    return ScannedDataManifests.NAME;
+  }
+
+  @Override
+  public long value() {
+    return value;
+  }
+
+  public static TaskScannedDataManifests from(ScanReport scanReport) {
+    CounterResult counter = scanReport.scanMetrics().scannedDataManifests();
+    long value = counter != null ? counter.value() : -1;

Review Comment:
   We would probably need to check built-in Spark metrics as a reference.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1262783177


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java:
##########
@@ -170,8 +189,35 @@ public String description() {
         table(), branch(), Spark3Util.describe(filterExpressions), groupingKeyFieldNamesAsString);
   }
 
+  @Override
+  public CustomTaskMetric[] reportDriverMetrics() {
+    ScanReport scanReport = metricsReportSupplier != null ? metricsReportSupplier.get() : null;
+    if (scanReport == null) {

Review Comment:
   I believe `instanceof` would return false if the report is null, causing an exception. We probably should fix the holder class to allow returning null.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] karuppayya commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "karuppayya (via GitHub)" <gi...@apache.org>.
karuppayya commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1262870999


##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReadMetrics.java:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source;
+
+import static scala.collection.JavaConverters.seqAsJavaListConverter;
+
+import java.util.List;
+import java.util.Map;
+import org.apache.iceberg.spark.SparkTestBaseWithCatalog;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
+import org.apache.spark.sql.execution.SparkPlan;
+import org.apache.spark.sql.execution.metric.SQLMetric;
+import org.junit.After;
+import org.junit.Test;
+import org.junit.jupiter.api.Assertions;
+import scala.collection.JavaConverters;
+
+public class TestSparkReadMetrics extends SparkTestBaseWithCatalog {

Review Comment:
   Added test for v1 and v2 tables.
   Metatdata tables(apart from [positional deletes table](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/PositionDeletesTable.java#L187)) , dont update `org.apache.iceberg.SnapshotScan#scanMetrics`.
   Since we dont have delete manifest metric in this PR, skipping the test for positional delete metadata table.



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TaskScannedDataManifests.java:
##########
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import org.apache.iceberg.metrics.CounterResult;
+import org.apache.iceberg.metrics.ScanReport;
+import org.apache.spark.sql.connector.metric.CustomTaskMetric;
+
+public class TaskScannedDataManifests implements CustomTaskMetric {
+  private final long value;
+
+  public TaskScannedDataManifests(long value) {
+    this.value = value;
+  }
+
+  @Override
+  public String name() {
+    return ScannedDataManifests.NAME;
+  }
+
+  @Override
+  public long value() {
+    return value;
+  }
+
+  public static TaskScannedDataManifests from(ScanReport scanReport) {
+    CounterResult counter = scanReport.scanMetrics().scannedDataManifests();
+    long value = counter != null ? counter.value() : -1;

Review Comment:
   I dont see any CustomTaskMetricImpl in Spark handling no data.
   Should we send 0 here 
   (I think ["N/A"](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala#L230) might not be a good idea and might convey that this data scan manifest was not releveant, WDYT)  



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java:
##########
@@ -170,8 +189,35 @@ public String description() {
         table(), branch(), Spark3Util.describe(filterExpressions), groupingKeyFieldNamesAsString);
   }
 
+  @Override
+  public CustomTaskMetric[] reportDriverMetrics() {
+    ScanReport scanReport = metricsReportSupplier != null ? metricsReportSupplier.get() : null;
+    if (scanReport == null) {
+      return new CustomTaskMetric[0];
+    }
+
+    List<CustomTaskMetric> driverMetrics = Lists.newArrayList();
+    driverMetrics.add(TaskTotalFileSize.from(scanReport));

Review Comment:
   I had added  all metrics from ScanMetricsResult at the point in time when the change was done,
   Should we add the remaining now or take in a different PR(since this cange is already reviewed almost)? 



##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReadMetrics.java:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source;
+
+import static scala.collection.JavaConverters.seqAsJavaListConverter;
+
+import java.util.List;
+import java.util.Map;
+import org.apache.iceberg.spark.SparkTestBaseWithCatalog;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
+import org.apache.spark.sql.execution.SparkPlan;
+import org.apache.spark.sql.execution.metric.SQLMetric;
+import org.junit.After;
+import org.junit.Test;
+import org.junit.jupiter.api.Assertions;
+import scala.collection.JavaConverters;
+
+public class TestSparkReadMetrics extends SparkTestBaseWithCatalog {
+
+  @After
+  public void removeTables() {
+    sql("DROP TABLE IF EXISTS %s", tableName);
+  }
+
+  @Test
+  public void testReadMetrics() throws NoSuchTableException {
+    sql("CREATE TABLE %s (id BIGINT) USING iceberg", tableName);
+
+    spark.range(10000).coalesce(1).writeTo(tableName).append();
+
+    Dataset<Row> df = spark.sql(String.format("select * from %s", tableName));
+    df.collect();
+
+    List<SparkPlan> sparkPlans =
+        seqAsJavaListConverter(df.queryExecution().executedPlan().collectLeaves()).asJava();
+    Map<String, SQLMetric> metricsMap =
+        JavaConverters.mapAsJavaMapConverter(sparkPlans.get(0).metrics()).asJava();
+    Assertions.assertEquals(1, metricsMap.get("scannedDataManifests").value());

Review Comment:
   Changed it to use `org.assertj.core.api.Assertions`
   Added more checks



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java:
##########
@@ -170,8 +189,35 @@ public String description() {
         table(), branch(), Spark3Util.describe(filterExpressions), groupingKeyFieldNamesAsString);
   }
 
+  @Override
+  public CustomTaskMetric[] reportDriverMetrics() {
+    ScanReport scanReport = metricsReportSupplier != null ? metricsReportSupplier.get() : null;
+    if (scanReport == null) {

Review Comment:
   I made it
   ```
     public ScanReport scanReport() {
       Preconditions.checkArgument(
           metricsReport == null || metricsReport instanceof ScanReport,
           "Metric report is not a scan report");
       return (ScanReport) metricsReport;
     }
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1269937719


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -449,8 +453,14 @@ private Scan buildBatchScan(Long snapshotId, Long asOfTimestamp, String branch,
     }
 
     scan = configureSplitPlanning(scan);
-

Review Comment:
   Can we keep this empty line?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] karuppayya commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "karuppayya (via GitHub)" <gi...@apache.org>.
karuppayya commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1242895614


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java:
##########
@@ -65,6 +82,7 @@ class SparkBatchQueryScan extends SparkPartitioningAwareScan<PartitionScanTask>
   private final Long endSnapshotId;
   private final Long asOfTimestamp;
   private final String tag;
+  private final SparkReadMetricReporter sparkReadMetricReporter;

Review Comment:
   done



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java:
##########
@@ -256,4 +275,56 @@ public String toString() {
         runtimeFilterExpressions,
         caseSensitive());
   }
+
+  @Override
+  public CustomTaskMetric[] reportDriverMetrics() {
+    List<CustomTaskMetric> customTaskMetrics = Lists.newArrayList();
+    Optional<ScanReport> scanReportOptional = sparkReadMetricReporter.getScanReport();

Review Comment:
   done



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/ResultDataFiles.java:
##########
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import java.util.Arrays;
+import org.apache.spark.sql.connector.metric.CustomMetric;
+
+public class ResultDataFiles implements CustomMetric {
+
+  static final String name = "resultDataFiles";
+
+  @Override
+  public String name() {
+    return name;
+  }
+
+  @Override
+  public String description() {
+    return "Result data files";

Review Comment:
   done



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java:
##########
@@ -256,4 +275,56 @@ public String toString() {
         runtimeFilterExpressions,
         caseSensitive());
   }
+
+  @Override
+  public CustomTaskMetric[] reportDriverMetrics() {
+    List<CustomTaskMetric> customTaskMetrics = Lists.newArrayList();
+    Optional<ScanReport> scanReportOptional = sparkReadMetricReporter.getScanReport();
+
+    scanReportOptional.ifPresent(
+        scanReport -> {
+          Optional.ofNullable(scanReport.scanMetrics().totalFileSizeInBytes())
+              .ifPresent(
+                  counterResult ->
+                      customTaskMetrics.add(new TaskTotalFileSize(counterResult.value())));
+          Optional.ofNullable(scanReport.scanMetrics().totalPlanningDuration())
+              .ifPresent(
+                  timerResult ->
+                      customTaskMetrics.add(new TaskTotalPlanningDuration(timerResult.count())));
+
+          Optional.ofNullable(scanReport.scanMetrics().skippedDataFiles())
+              .ifPresent(
+                  skippedDataFilesResult ->
+                      customTaskMetrics.add(
+                          new TaskSkippedDataFiles(skippedDataFilesResult.value())));
+          Optional.ofNullable(scanReport.scanMetrics().resultDataFiles())
+              .ifPresent(
+                  resultDataFilesResult -> new TaskResultDataFiles(resultDataFilesResult.value()));
+
+          Optional.ofNullable(scanReport.scanMetrics().skippedDataManifests())
+              .ifPresent(
+                  skippedDataManifestsResult ->
+                      customTaskMetrics.add(
+                          new TaskSkippedDataManifests(skippedDataManifestsResult.value())));
+          Optional.ofNullable(scanReport.scanMetrics().scannedDataManifests())
+              .ifPresent(
+                  scannedDataManifestResult ->
+                      customTaskMetrics.add(
+                          new TaskScannedDataManifests(scannedDataManifestResult.value())));
+        });
+
+    return customTaskMetrics.toArray(new CustomTaskMetric[0]);
+  }
+
+  @Override
+  public CustomMetric[] supportedCustomMetrics() {

Review Comment:
   done



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java:
##########
@@ -256,4 +275,56 @@ public String toString() {
         runtimeFilterExpressions,
         caseSensitive());
   }
+
+  @Override
+  public CustomTaskMetric[] reportDriverMetrics() {

Review Comment:
   done



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/ResultDataFiles.java:
##########
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import java.util.Arrays;
+import org.apache.spark.sql.connector.metric.CustomMetric;
+
+public class ResultDataFiles implements CustomMetric {
+
+  static final String name = "resultDataFiles";
+
+  @Override
+  public String name() {
+    return name;
+  }
+
+  @Override
+  public String description() {
+    return "Result data files";
+  }
+
+  @Override
+  public String aggregateTaskMetrics(long[] taskMetrics) {

Review Comment:
   Done



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/ResultDataFiles.java:
##########
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import java.util.Arrays;
+import org.apache.spark.sql.connector.metric.CustomMetric;
+
+public class ResultDataFiles implements CustomMetric {
+
+  static final String name = "resultDataFiles";

Review Comment:
   done



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -420,12 +423,15 @@ private Scan buildBatchScan() {
   private Scan buildBatchScan(Long snapshotId, Long asOfTimestamp, String branch, String tag) {
     Schema expectedSchema = schemaWithMetadataColumns();
 
+    sparkReadMetricReporter = new SparkReadMetricReporter();

Review Comment:
   removed



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/InMemoryReadMetricReporter.java:
##########
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import java.util.function.Supplier;
+import org.apache.iceberg.metrics.MetricsReport;
+import org.apache.iceberg.metrics.MetricsReporter;
+import org.apache.iceberg.metrics.ScanReport;
+
+public class InMemoryReadMetricReporter implements MetricsReporter, Supplier<ScanReport> {
+
+  private MetricsReport metricsReport;
+
+  @Override
+  public void report(MetricsReport report) {
+    this.metricsReport = report;
+  }
+
+  @Override
+  public ScanReport get() {
+    return (ScanReport) metricsReport;

Review Comment:
   Added code to check type



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TaskResultDataFiles.java:
##########
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import org.apache.spark.sql.connector.metric.CustomTaskMetric;
+
+public class TaskResultDataFiles implements CustomTaskMetric {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1262770542


##########
core/src/main/java/org/apache/iceberg/metrics/InMemoryReadMetricReporter.java:
##########
@@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.metrics;
+
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+
+public class InMemoryReadMetricReporter implements MetricsReporter {
+
+  private MetricsReport metricsReport;
+
+  @Override
+  public void report(MetricsReport report) {
+    this.metricsReport = (ScanReport) report;

Review Comment:
   We also should use `Metrics` vs `Metric` in the name to match the interface.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1262861193


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java:
##########
@@ -170,8 +189,35 @@ public String description() {
         table(), branch(), Spark3Util.describe(filterExpressions), groupingKeyFieldNamesAsString);
   }
 
+  @Override
+  public CustomTaskMetric[] reportDriverMetrics() {
+    ScanReport scanReport = metricsReportSupplier != null ? metricsReportSupplier.get() : null;
+    if (scanReport == null) {
+      return new CustomTaskMetric[0];
+    }
+
+    List<CustomTaskMetric> driverMetrics = Lists.newArrayList();
+    driverMetrics.add(TaskTotalFileSize.from(scanReport));

Review Comment:
   Here is a list of the metrics:
   ```
   TimerResult totalPlanningDuration(); // DONE
   CounterResult resultDataFiles(); // DONE
   CounterResult resultDeleteFiles(); // MISSING
   CounterResult totalDataManifests(); // MISSING
   CounterResult totalDeleteManifests(); // MISSING
   CounterResult scannedDataManifests(); // DONE
   CounterResult skippedDataManifests(); // DONE
   CounterResult totalFileSizeInBytes(); // DONE
   CounterResult totalDeleteFileSizeInBytes(); // MISSING
   CounterResult skippedDataFiles(); // DONE
   CounterResult skippedDeleteFiles(); // MISSING
   CounterResult scannedDeleteManifests(); // MISSING
   CounterResult skippedDeleteManifests(); // MISSING
   CounterResult indexedDeleteFiles(); // MISSING
   CounterResult equalityDeleteFiles(); // MISSING
   CounterResult positionalDeleteFiles(); // MISSING
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1262875232


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TaskScannedDataManifests.java:
##########
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import org.apache.iceberg.metrics.CounterResult;
+import org.apache.iceberg.metrics.ScanReport;
+import org.apache.spark.sql.connector.metric.CustomTaskMetric;
+
+public class TaskScannedDataManifests implements CustomTaskMetric {
+  private final long value;
+
+  public TaskScannedDataManifests(long value) {
+    this.value = value;
+  }
+
+  @Override
+  public String name() {
+    return ScannedDataManifests.NAME;
+  }
+
+  @Override
+  public long value() {
+    return value;
+  }
+
+  public static TaskScannedDataManifests from(ScanReport scanReport) {
+    CounterResult counter = scanReport.scanMetrics().scannedDataManifests();
+    long value = counter != null ? counter.value() : -1;

Review Comment:
   We would probably need to check built-in Spark metrics as a reference. It will probably be 0 in other cases as Spark passes an empty array of task values and then does sum on it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1263182514


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java:
##########
@@ -170,8 +189,35 @@ public String description() {
         table(), branch(), Spark3Util.describe(filterExpressions), groupingKeyFieldNamesAsString);
   }
 
+  @Override
+  public CustomTaskMetric[] reportDriverMetrics() {
+    ScanReport scanReport = scanReportSupplier != null ? scanReportSupplier.get() : null;
+    if (scanReport == null) {

Review Comment:
   Optional: What about an empty line before `if` to separate this block?



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TaskTotalPlanningDuration.java:
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import org.apache.iceberg.metrics.ScanReport;
+import org.apache.iceberg.metrics.TimerResult;
+import org.apache.spark.sql.connector.metric.CustomTaskMetric;
+
+public class TaskTotalPlanningDuration implements CustomTaskMetric {
+
+  private final long value;
+
+  private TaskTotalPlanningDuration(long value) {
+    this.value = value;
+  }
+
+  @Override
+  public String name() {
+    return TotalPlanningDuration.NAME;
+  }
+
+  @Override
+  public long value() {
+    return value;
+  }
+
+  public static TaskTotalPlanningDuration from(ScanReport scanReport) {
+    TimerResult timerResult = scanReport.scanMetrics().totalPlanningDuration();
+    long value = timerResult != null ? timerResult.count() : -1;

Review Comment:
   Shouldn't this be `totalDuration().toMillis()`? Shall we also use 0 as the default value?



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TotalFileSize.java:
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import org.apache.spark.sql.connector.metric.CustomSumMetric;
+
+public class TotalFileSize extends CustomSumMetric {
+
+  static final String NAME = "totalFileSize";
+
+  @Override
+  public String name() {
+    return NAME;
+  }
+
+  @Override
+  public String description() {
+    return "total file size";

Review Comment:
   What about `total file size (bytes)`?



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TotalPlanningDuration.java:
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import org.apache.spark.sql.connector.metric.CustomSumMetric;
+
+public class TotalPlanningDuration extends CustomSumMetric {
+
+  static final String NAME = "totalPlanningDuration";
+
+  @Override
+  public String name() {
+    return NAME;
+  }
+
+  @Override
+  public String description() {
+    return "total planning duration";

Review Comment:
   What about adding info on what values we show?
   
   ```
   total planning duration (ms)
   ```



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/ScannedDataManifests.java:
##########
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import org.apache.spark.sql.connector.metric.CustomSumMetric;
+
+public class ScannedDataManifests extends CustomSumMetric {
+
+  static final String NAME = "scannedDataManifests";
+
+  @Override
+  public String name() {
+    return NAME;
+  }
+
+  @Override
+  public String description() {
+    return "number of scanned data manifests";
+  }
+
+  @Override
+  public String aggregateTaskMetrics(long[] taskMetrics) {

Review Comment:
   This seems redundant as we extend `CustomSumMetric`?



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkPartitioningAwareScan.java:
##########
@@ -74,9 +76,10 @@ abstract class SparkPartitioningAwareScan<T extends PartitionScanTask> extends S
       Scan<?, ? extends ScanTask, ? extends ScanTaskGroup<?>> scan,
       SparkReadConf readConf,
       Schema expectedSchema,
-      List<Expression> filters) {
+      List<Expression> filters,
+      Supplier<ScanReport> scanReportSupplier) {
 

Review Comment:
   Same comment about the empty line here.



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkCopyOnWriteScan.java:
##########
@@ -68,9 +71,10 @@ class SparkCopyOnWriteScan extends SparkPartitioningAwareScan<FileScanTask>
       Snapshot snapshot,
       SparkReadConf readConf,
       Schema expectedSchema,
-      List<Expression> filters) {
+      List<Expression> filters,
+      Supplier<ScanReport> scanReportSupplier) {
 
-    super(spark, table, scan, readConf, expectedSchema, filters);

Review Comment:
   I see that you removed an empty line before super in `SparkBatchQueryScan` above. I think that makes sense, could you apply it to this class too for consistency?



##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReadMetrics.java:
##########
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source;
+
+import static scala.collection.JavaConverters.seqAsJavaListConverter;
+
+import java.util.List;
+import java.util.Map;
+import org.apache.iceberg.spark.SparkTestBaseWithCatalog;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
+import org.apache.spark.sql.execution.SparkPlan;
+import org.apache.spark.sql.execution.metric.SQLMetric;
+import org.assertj.core.api.Assertions;
+import org.junit.After;
+import org.junit.Test;
+import scala.collection.JavaConverters;
+
+public class TestSparkReadMetrics extends SparkTestBaseWithCatalog {
+
+  @After
+  public void removeTables() {
+    sql("DROP TABLE IF EXISTS %s", tableName);
+  }
+
+  @Test
+  public void testReadMetricsForV1Table() throws NoSuchTableException {
+    sql(
+        "CREATE TABLE %s (id BIGINT) USING iceberg TBLPROPERTIES ('format-version'='2')",
+        tableName);
+
+    spark.range(10000).coalesce(1).writeTo(tableName).append();
+    spark.range(10001, 20000).coalesce(1).writeTo(tableName).append();
+
+    Dataset<Row> df = spark.sql(String.format("select * from %s where id < 10000", tableName));
+    df.collect();
+
+    List<SparkPlan> sparkPlans =
+        seqAsJavaListConverter(df.queryExecution().executedPlan().collectLeaves()).asJava();
+    Map<String, SQLMetric> metricsMap =
+        JavaConverters.mapAsJavaMapConverter(sparkPlans.get(0).metrics()).asJava();
+    Assertions.assertThat(metricsMap.get("skippedDataFiles").value()).isEqualTo(1);
+    Assertions.assertThat(metricsMap.get("scannedDataManifests").value()).isEqualTo(2);
+    Assertions.assertThat(metricsMap.get("resultDataFiles").value()).isEqualTo(1);

Review Comment:
   I guess some of these were renamed.



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/scannedDataFiles.java:
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import org.apache.spark.sql.connector.metric.CustomSumMetric;
+
+public class scannedDataFiles extends CustomSumMetric {

Review Comment:
   Seems like a typo? Should start with a capital letter?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] frankliee commented on pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "frankliee (via GitHub)" <gi...@apache.org>.
frankliee commented on PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#issuecomment-1650896010

   Could I backport this pr to Spark 3.3? @karuppayya  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1254704342


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TaskScannedDataManifests.java:
##########
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import org.apache.iceberg.metrics.CounterResult;
+import org.apache.iceberg.metrics.ScanReport;
+import org.apache.spark.sql.connector.metric.CustomTaskMetric;
+
+public class TaskScannedDataManifests implements CustomTaskMetric {
+  private final long value;
+
+  public TaskScannedDataManifests(long value) {

Review Comment:
   should be private because there's the `from(ScanReport)` method I would have assumed? Same applies for all the other constructors.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1220722569


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java:
##########
@@ -256,4 +275,56 @@ public String toString() {
         runtimeFilterExpressions,
         caseSensitive());
   }
+
+  @Override
+  public CustomTaskMetric[] reportDriverMetrics() {

Review Comment:
   Why implement this only in `SparkBatchQueryScan`? Can we do this in `SparkScan`?



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java:
##########
@@ -65,6 +82,7 @@ class SparkBatchQueryScan extends SparkPartitioningAwareScan<PartitionScanTask>
   private final Long endSnapshotId;
   private final Long asOfTimestamp;
   private final String tag;
+  private final SparkReadMetricReporter sparkReadMetricReporter;

Review Comment:
   What about using `Supplier<ScanReport> scanReportSupplier` to make this independent from a particular metrics reporter? We can use `metricsReporter::scanReport` closure to construct it.
   
   ```
   InMemoryMetricsReporter metricsReporter = new InMemoryMetricsReporter();
   ...
   scan.metricsReporter(metricsReporter)
   ...
   return new SparkBatchQueryScan(..., metricsReporter::scanReport);
   ```



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/ResultDataFiles.java:
##########
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import java.util.Arrays;
+import org.apache.spark.sql.connector.metric.CustomMetric;
+
+public class ResultDataFiles implements CustomMetric {
+
+  static final String name = "resultDataFiles";

Review Comment:
   Static constant naming: `name` -> `NAME`



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/SparkReadMetricReporter.java:
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import java.util.Optional;
+import org.apache.iceberg.metrics.MetricsReport;
+import org.apache.iceberg.metrics.MetricsReporter;
+import org.apache.iceberg.metrics.ScanReport;
+
+public class SparkReadMetricReporter implements MetricsReporter {
+
+  private MetricsReport metricsReport;
+
+  @Override
+  public void report(MetricsReport report) {
+    this.metricsReport = report;
+  }
+
+  public Optional<ScanReport> getScanReport() {

Review Comment:
   Let's not use `Optional` and `getXXX` prefix.
   
   ```
   public ScanReport scanReport() {
     return (ScanReport) metricsReport;
   }
   ```
   
   We can check if the value is null later.



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/ResultDataFiles.java:
##########
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import java.util.Arrays;
+import org.apache.spark.sql.connector.metric.CustomMetric;
+
+public class ResultDataFiles implements CustomMetric {
+
+  static final String name = "resultDataFiles";
+
+  @Override
+  public String name() {
+    return name;
+  }
+
+  @Override
+  public String description() {
+    return "Result data files";
+  }
+
+  @Override
+  public String aggregateTaskMetrics(long[] taskMetrics) {

Review Comment:
   Can we extend `CustomSumMetric` from Spark instead of overriding this?



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TaskResultDataFiles.java:
##########
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import org.apache.spark.sql.connector.metric.CustomTaskMetric;
+
+public class TaskResultDataFiles implements CustomTaskMetric {

Review Comment:
   I'd add a static `from(ScanReport scanReport)` to each `TaskXXX` metric like suggested above and make the constructor private.



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java:
##########
@@ -256,4 +275,56 @@ public String toString() {
         runtimeFilterExpressions,
         caseSensitive());
   }
+
+  @Override
+  public CustomTaskMetric[] reportDriverMetrics() {
+    List<CustomTaskMetric> customTaskMetrics = Lists.newArrayList();
+    Optional<ScanReport> scanReportOptional = sparkReadMetricReporter.getScanReport();
+
+    scanReportOptional.ifPresent(
+        scanReport -> {
+          Optional.ofNullable(scanReport.scanMetrics().totalFileSizeInBytes())
+              .ifPresent(
+                  counterResult ->
+                      customTaskMetrics.add(new TaskTotalFileSize(counterResult.value())));
+          Optional.ofNullable(scanReport.scanMetrics().totalPlanningDuration())
+              .ifPresent(
+                  timerResult ->
+                      customTaskMetrics.add(new TaskTotalPlanningDuration(timerResult.count())));
+
+          Optional.ofNullable(scanReport.scanMetrics().skippedDataFiles())
+              .ifPresent(
+                  skippedDataFilesResult ->
+                      customTaskMetrics.add(
+                          new TaskSkippedDataFiles(skippedDataFilesResult.value())));
+          Optional.ofNullable(scanReport.scanMetrics().resultDataFiles())
+              .ifPresent(
+                  resultDataFilesResult -> new TaskResultDataFiles(resultDataFilesResult.value()));
+
+          Optional.ofNullable(scanReport.scanMetrics().skippedDataManifests())
+              .ifPresent(
+                  skippedDataManifestsResult ->
+                      customTaskMetrics.add(
+                          new TaskSkippedDataManifests(skippedDataManifestsResult.value())));
+          Optional.ofNullable(scanReport.scanMetrics().scannedDataManifests())
+              .ifPresent(
+                  scannedDataManifestResult ->
+                      customTaskMetrics.add(
+                          new TaskScannedDataManifests(scannedDataManifestResult.value())));
+        });
+
+    return customTaskMetrics.toArray(new CustomTaskMetric[0]);
+  }
+
+  @Override
+  public CustomMetric[] supportedCustomMetrics() {

Review Comment:
   This overrides `supportedCustomMetrics()` in `SparkScan` and breaks it. Can we move this logic there and combine with existing metrics?



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java:
##########
@@ -420,12 +423,15 @@ private Scan buildBatchScan() {
   private Scan buildBatchScan(Long snapshotId, Long asOfTimestamp, String branch, String tag) {
     Schema expectedSchema = schemaWithMetadataColumns();
 
+    sparkReadMetricReporter = new SparkReadMetricReporter();

Review Comment:
   Why init it here again? We should either use local vars or not init it here.



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/ScannedDataManifests.java:
##########
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import java.util.Arrays;
+import org.apache.spark.sql.connector.metric.CustomMetric;
+
+public class ScannedDataManifests implements CustomMetric {

Review Comment:
   Comments above apply to all metrics below.



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java:
##########
@@ -256,4 +275,56 @@ public String toString() {
         runtimeFilterExpressions,
         caseSensitive());
   }
+
+  @Override
+  public CustomTaskMetric[] reportDriverMetrics() {
+    List<CustomTaskMetric> customTaskMetrics = Lists.newArrayList();
+    Optional<ScanReport> scanReportOptional = sparkReadMetricReporter.getScanReport();

Review Comment:
   Iceberg historically uses null instead of `Optional` and I think we should continue to follow that. Also, Spotless makes these closures really hard to heard.
   
   What about something like this?
   
   ```
   @Override
   public CustomTaskMetric[] reportDriverMetrics() {
     ScanReport scanReport = scanReportSupplier.get();
   
     if (scanReport == null) {
       return new CustomTaskMetric[0];
     }
   
     List<CustomTaskMetric> driverMetrics = Lists.newArrayList();
     driverMetrics.add(TaskTotalFileSize.from(scanReport));
     ...
     return driverMetrics.toArray(new CustomTaskMetric[0]);
   }
   ```
   
   Where `TaskTotalFileSize` would be defined as follows:
   
   ```
   public class TaskTotalFileSize implements CustomTaskMetric {
   
     private final long value;
   
     public static TaskTotalFileSize from(ScanReport scanReport) {
       CounterResult counter = scanReport.scanMetrics().totalFileSizeInBytes();
       long value = counter != null ? counter.value() : -1;
       return new TotalFileSizeTaskMetric(value);
     }
   
     private TaskTotalFileSize(long value) {
       this.value = value;
     }
   
     @Override
     public String name() {
       return TotalFileSize.NAME;
     }
   
     @Override
     public long value() {
       return value;
     }
   }
   ```



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/ResultDataFiles.java:
##########
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import java.util.Arrays;
+import org.apache.spark.sql.connector.metric.CustomMetric;
+
+public class ResultDataFiles implements CustomMetric {
+
+  static final String name = "resultDataFiles";
+
+  @Override
+  public String name() {
+    return name;
+  }
+
+  @Override
+  public String description() {
+    return "Result data files";

Review Comment:
   We should follow what Spark does in other places. Specifically, its name does not with capital letters.



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/SparkReadMetricReporter.java:
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import java.util.Optional;
+import org.apache.iceberg.metrics.MetricsReport;
+import org.apache.iceberg.metrics.MetricsReporter;
+import org.apache.iceberg.metrics.ScanReport;
+
+public class SparkReadMetricReporter implements MetricsReporter {

Review Comment:
   Is there a better name? Shall we call it `InMemoryMetricsReporter` and more to `core`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] puchengy commented on pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "puchengy (via GitHub)" <gi...@apache.org>.
puchengy commented on PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#issuecomment-1645712259

   @karuppayya thanks for the contributions, would you mind porting these to lower spark versions? (I particularly interested in spark 3.2). Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] karuppayya commented on pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "karuppayya (via GitHub)" <gi...@apache.org>.
karuppayya commented on PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#issuecomment-1645848835

   @puchengy This cannot be ported as it is dependent on Spark changes to [support driver metrics](https://github.com/apache/spark/pull/37205), which is availble from Spark 3.4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] karuppayya commented on pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "karuppayya (via GitHub)" <gi...@apache.org>.
karuppayya commented on PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#issuecomment-1636462712

   The test failures doesn't seem to be related
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] karuppayya commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "karuppayya (via GitHub)" <gi...@apache.org>.
karuppayya commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1263094934


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/ScannedDataManifests.java:
##########
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import org.apache.spark.sql.connector.metric.CustomSumMetric;
+
+public class ScannedDataManifests extends CustomSumMetric {
+
+  static final String NAME = "scannedDataManifests";
+
+  @Override
+  public String name() {
+    return NAME;
+  }
+
+  @Override
+  public String description() {
+    return "num scanned data manifests";

Review Comment:
   based on [DataSourceScanExec](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L224), changing the description



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TaskScannedDataManifests.java:
##########
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import org.apache.iceberg.metrics.CounterResult;
+import org.apache.iceberg.metrics.ScanReport;
+import org.apache.spark.sql.connector.metric.CustomTaskMetric;
+
+public class TaskScannedDataManifests implements CustomTaskMetric {
+  private final long value;
+
+  public TaskScannedDataManifests(long value) {
+    this.value = value;
+  }
+
+  @Override
+  public String name() {
+    return ScannedDataManifests.NAME;
+  }
+
+  @Override
+  public long value() {
+    return value;
+  }
+
+  public static TaskScannedDataManifests from(ScanReport scanReport) {
+    CounterResult counter = scanReport.scanMetrics().scannedDataManifests();
+    long value = counter != null ? counter.value() : -1;

Review Comment:
   Making it 0 based on [this](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala#L40C1-L40C1)



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/ScannedDataManifests.java:
##########
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import org.apache.spark.sql.connector.metric.CustomSumMetric;
+
+public class ScannedDataManifests extends CustomSumMetric {
+
+  static final String NAME = "scannedDataManifests";
+
+  @Override
+  public String name() {
+    return NAME;
+  }
+
+  @Override
+  public String description() {
+    return "number of scanned data manifests";
+  }
+
+  @Override
+  public String aggregateTaskMetrics(long[] taskMetrics) {

Review Comment:
   Added for some debugging, removed.



##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TaskTotalPlanningDuration.java:
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import org.apache.iceberg.metrics.ScanReport;
+import org.apache.iceberg.metrics.TimerResult;
+import org.apache.spark.sql.connector.metric.CustomTaskMetric;
+
+public class TaskTotalPlanningDuration implements CustomTaskMetric {
+
+  private final long value;
+
+  private TaskTotalPlanningDuration(long value) {
+    this.value = value;
+  }
+
+  @Override
+  public String name() {
+    return TotalPlanningDuration.NAME;
+  }
+
+  @Override
+  public long value() {
+    return value;
+  }
+
+  public static TaskTotalPlanningDuration from(ScanReport scanReport) {
+    TimerResult timerResult = scanReport.scanMetrics().totalPlanningDuration();
+    long value = timerResult != null ? timerResult.count() : -1;

Review Comment:
   Keeping the default at -1 based on [Spark default](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala#L153)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1265369789


##########
core/src/main/java/org/apache/iceberg/metrics/InMemoryMetricsReporter.java:
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.metrics;
+
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+
+public class InMemoryMetricsReporter implements MetricsReporter {
+
+  private MetricsReport metricsReport;
+
+  @Override
+  public void report(MetricsReport report) {
+    this.metricsReport = report;
+  }
+
+  public ScanReport scanReport() {
+    Preconditions.checkArgument(
+        metricsReport == null || metricsReport instanceof ScanReport,
+        "Metric report is not a scan report");

Review Comment:
   ```suggestion
           "Metrics report is not a scan report");
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1254703786


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TaskScannedDataManifests.java:
##########
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source.metrics;
+
+import org.apache.iceberg.metrics.CounterResult;
+import org.apache.iceberg.metrics.ScanReport;
+import org.apache.spark.sql.connector.metric.CustomTaskMetric;
+
+public class TaskScannedDataManifests implements CustomTaskMetric {
+  private final long value;
+
+  public TaskScannedDataManifests(long value) {
+    this.value = value;
+  }
+
+  @Override
+  public String name() {
+    return ScannedDataManifests.NAME;
+  }
+
+  @Override
+  public long value() {
+    return value;
+  }
+
+  public static TaskScannedDataManifests from(ScanReport scanReport) {
+    CounterResult counter = scanReport.scanMetrics().scannedDataManifests();
+    long value = counter != null ? counter.value() : -1;

Review Comment:
   is `-1` typical for Spark metrics to indicate that no data was available? Same applies for all the other constructors.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #7447: Display Spark read metrics on Spark SQL UI

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #7447:
URL: https://github.com/apache/iceberg/pull/7447#discussion_r1254706002


##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReadMetrics.java:
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg.spark.source;
+
+import static scala.collection.JavaConverters.seqAsJavaListConverter;
+
+import java.util.List;
+import java.util.Map;
+import org.apache.iceberg.spark.SparkTestBaseWithCatalog;
+import org.apache.spark.sql.Dataset;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.catalyst.analysis.NoSuchTableException;
+import org.apache.spark.sql.execution.SparkPlan;
+import org.apache.spark.sql.execution.metric.SQLMetric;
+import org.junit.After;
+import org.junit.Test;
+import org.junit.jupiter.api.Assertions;
+import scala.collection.JavaConverters;
+
+public class TestSparkReadMetrics extends SparkTestBaseWithCatalog {
+
+  @After
+  public void removeTables() {
+    sql("DROP TABLE IF EXISTS %s", tableName);
+  }
+
+  @Test
+  public void testReadMetrics() throws NoSuchTableException {
+    sql("CREATE TABLE %s (id BIGINT) USING iceberg", tableName);
+
+    spark.range(10000).coalesce(1).writeTo(tableName).append();
+
+    Dataset<Row> df = spark.sql(String.format("select * from %s", tableName));
+    df.collect();
+
+    List<SparkPlan> sparkPlans =
+        seqAsJavaListConverter(df.queryExecution().executedPlan().collectLeaves()).asJava();
+    Map<String, SQLMetric> metricsMap =
+        JavaConverters.mapAsJavaMapConverter(sparkPlans.get(0).metrics()).asJava();
+    Assertions.assertEquals(1, metricsMap.get("scannedDataManifests").value());

Review Comment:
   we're currently moving away from Junit assertions to AssertJ, can you please update this to `Assertions.assertThat(...).isEqualTo(1)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org