You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/02/04 19:50:40 UTC

[GitHub] [spark] viirya opened a new pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

viirya opened a new pull request #31476:
URL: https://github.com/apache/spark/pull/31476


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   This patch proposes to add a few public API change to DS v2, to make DS v2 scan can report metrics to Spark.
   
   Two public interfaces `CustomMetric` is added for general custom metric. `LongMetric` is for a metric which reports a long value. Currently we may only use `LongMetric` to align with `SQLMetric` which is long value based.
   
   There are two public methods added to existing public interfaces.
   
   * PartitionReader.getCustomMetrics(): returns an array of CustomMetric. Here is where the actual metrics values are collected. Empty array by default.
   * Scan.supportedCustomMetrics(): returns an array of supported custom metrics with name and description. Empty array by default.
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   In order to report custom metrics, we need some public API change in DS v2 to make it possible.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   No
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   
   This only adds interfaces. In follow-up PRs where adding implementation there will be tests added. See #31451 and #31398 for some details and manual test there.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804528556


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40953/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r578219652



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomMetric.java
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.read.PartitionReader;
+import org.apache.spark.sql.connector.read.Scan;
+
+/**
+ * A custom metric. This is a logical representation of a metric reported by data sources during
+ * read path. Data sources can report supported metric list by {@link Scan} to Spark in query
+ * planning. During query execution, Spark will collect the metrics per partition by
+ * {@link PartitionReader} and combine metrics from partitions to the final result. How Spark
+ * combines metrics depends on the metric type. For streaming query, Spark will collect and combine
+ * metrics for a final result per micro batch.
+ *
+ * The metrics will be gathered during query execution back to the driver and then combined. The
+ * final result will be shown up in the physical operator in Spark UI.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface CustomMetric {
+    /**
+     * Returns the name of custom metric.
+     */
+    String name();
+
+    /**
+     * Returns the description of custom metric.
+     */
+    String description();
+
+    /**
+     * Supported metric type. The metric types must be supported by Spark SQL internal metrics.

Review comment:
       Spark SQL internal metrics always return a long value. Do we still need the `LongMetric` interface if metrics types are limited to SQL internal metrics?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-781801426


   **[Test build #135252 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135252/testReport)** for PR 31476 at commit [`cce58a5`](https://github.com/apache/spark/commit/cce58a56d5c48cb351688411d65159779550b5de).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r570505479



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/CustomMetric.java
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.read;

Review comment:
       can this be used by both read and write path? 

##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/LongMetric.java
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.read;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom metric that reports a long value.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface LongMetric extends CustomMetric {
+  /**
+   * Returns the name of custom metric.
+   */
+  String name();

Review comment:
       why it needs to redefine name and description?

##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Scan.java
##########
@@ -102,4 +102,13 @@ default MicroBatchStream toMicroBatchStream(String checkpointLocation) {
   default ContinuousStream toContinuousStream(String checkpointLocation) {
     throw new UnsupportedOperationException(description() + ": Continuous scan are not supported");
   }
+
+  /**
+   * Returns an array of supported custom metrics with name and description.
+   * By default it returns empty array.
+   */
+  default CustomMetric[] supportedCustomMetrics() {

Review comment:
       maybe we should have a `SupportsReportMetrics` trait with this default implementation?

##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/LongMetric.java
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.read;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom metric that reports a long value.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface LongMetric extends CustomMetric {
+  /**
+   * Returns the name of custom metric.
+   */
+  String name();
+
+  /**
+   * Returns the description of custom metric.
+   */
+  String description();
+
+  /**
+   * Returns the value of custom metric.
+   */
+  Long value();

Review comment:
       nit: would it be better to use `long` to avoid boxing/unboxing?

##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Scan.java
##########
@@ -102,4 +102,13 @@ default MicroBatchStream toMicroBatchStream(String checkpointLocation) {
   default ContinuousStream toContinuousStream(String checkpointLocation) {
     throw new UnsupportedOperationException(description() + ": Continuous scan are not supported");
   }
+
+  /**
+   * Returns an array of supported custom metrics with name and description.
+   * By default it returns empty array.
+   */
+  default CustomMetric[] supportedCustomMetrics() {

Review comment:
       it's a nit from me so I'm fine either way. About the method naming, should we make it shorter like `metrics`? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773632690






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zsxwing commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
zsxwing commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r580487285



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomMetric.java
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.read.PartitionReader;
+import org.apache.spark.sql.connector.read.Scan;
+
+/**
+ * A custom metric. This is a logical representation of a metric reported by data sources during
+ * read path. Data sources can report supported metric list by {@link Scan} to Spark in query
+ * planning. During query execution, Spark will collect the metrics per partition by
+ * {@link PartitionReader} and combine metrics from partitions to the final result. How Spark
+ * combines metrics depends on the metric type. For streaming query, Spark will collect and combine
+ * metrics for a final result per micro batch.
+ *
+ * The metrics will be gathered during query execution back to the driver and then combined. The
+ * final result will be shown up in the physical operator in Spark UI.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface CustomMetric {
+    /**
+     * Returns the name of custom metric.
+     */
+    String name();
+
+    /**
+     * Returns the description of custom metric.
+     */
+    String description();
+
+    /**
+     * Supported metric type. The metric types must be supported by Spark SQL internal metrics.

Review comment:
       > The metric types must be supported by Spark SQL internal metrics.
   
   It sounds weird that we require users to understand internal private APIs when using a public API.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r578874597



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomMetric.java
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.read.PartitionReader;
+import org.apache.spark.sql.connector.read.Scan;
+
+/**
+ * A custom metric. This is a logical representation of a metric reported by data sources during
+ * read path. Data sources can report supported metric list by {@link Scan} to Spark in query
+ * planning. During query execution, Spark will collect the metrics per partition by
+ * {@link PartitionReader} and combine metrics from partitions to the final result. How Spark
+ * combines metrics depends on the metric type. For streaming query, Spark will collect and combine
+ * metrics for a final result per micro batch.
+ *
+ * The metrics will be gathered during query execution back to the driver and then combined. The
+ * final result will be shown up in the physical operator in Spark UI.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface CustomMetric {
+    /**
+     * Returns the name of custom metric.
+     */
+    String name();
+
+    /**
+     * Returns the description of custom metric.
+     */
+    String description();
+
+    /**
+     * Supported metric type. The metric types must be supported by Spark SQL internal metrics.

Review comment:
       I'll be +1 on returning the long value directly in `CustomMetric`. I don't have a strong opinion about the naming. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r574009954



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Scan.java
##########
@@ -102,4 +102,13 @@ default MicroBatchStream toMicroBatchStream(String checkpointLocation) {
   default ContinuousStream toContinuousStream(String checkpointLocation) {
     throw new UnsupportedOperationException(description() + ": Continuous scan are not supported");
   }
+
+  /**
+   * Returns an array of supported custom metrics with name and description.
+   * By default it returns empty array.
+   */
+  default CustomMetric[] supportedCustomMetrics() {

Review comment:
       @cloud-fan Removing `trait` is a little different story. The point was we don't need to change `Scan` interface like this every time. For the use cases which only depends on `Scan` interface, we can guarantee that nothing is be broken.
   > I don't see how a mixin trait helps avoid breaking change. Removing the trait later is also a breaking change.
   
   However, I also don't want to block this PR like this for a long time. So, I switch to give +1 for @rdblue's suggestion.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804742964


   **[Test build #136396 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136396/testReport)** for PR 31476 at commit [`eb9d94a`](https://github.com/apache/spark/commit/eb9d94a07a38d6efcd71f12cc1c2db22bf8a7019).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804485460


   **[Test build #136369 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136369/testReport)** for PR 31476 at commit [`a35c056`](https://github.com/apache/spark/commit/a35c05684f6c1d27257dc0c9e8b2a6d24666eb16).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-778700341


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135141/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773707048


   **[Test build #134886 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134886/testReport)** for PR 31476 at commit [`a0f738d`](https://github.com/apache/spark/commit/a0f738d639376f0981cd2edb4bb1d931150c6d93).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804881240


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40979/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zsxwing commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
zsxwing commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r570666630



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomMetric.java
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom metric.

Review comment:
       Since this is a public API, it would be great to add more information to explain how to use APIs as a developer. For example,
   
   - If I define my own `CustomMetric`, how does Spark use it?
   - If I define a metric type that doesn't support `sum`, for example, measure the executor jvm heap size, how does Spark handle it?
   - If my `PartitionReaderWithMetrics` returns metrics for a partition, will Spark combine them for partitions of a Spark job?
   - In the streaming case, how does Spark combine metrics from different micro batches?
   - If I would like to report metrics in driver, how do I do it?
   
   I feel you might need to build an interface similar to the private `SQLMetric`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-781937240


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135252/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zsxwing commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
zsxwing commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r580522347



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomMetric.java
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.read.PartitionReader;
+import org.apache.spark.sql.connector.read.Scan;
+
+/**
+ * A custom metric. This is a logical representation of a metric reported by data sources during
+ * read path. Data sources can report supported metric list by {@link Scan} to Spark in query
+ * planning. During query execution, Spark will collect the metrics per partition by
+ * {@link PartitionReader} and combine metrics from partitions to the final result. How Spark
+ * combines metrics depends on the metric type. For streaming query, Spark will collect and combine
+ * metrics for a final result per micro batch.
+ *
+ * The metrics will be gathered during query execution back to the driver and then combined. The
+ * final result will be shown up in the physical operator in Spark UI.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface CustomMetric {
+    /**
+     * Returns the name of custom metric.
+     */
+    String name();
+
+    /**
+     * Returns the description of custom metric.
+     */
+    String description();
+
+    /**
+     * Supported metric type. The metric types must be supported by Spark SQL internal metrics.

Review comment:
       +1 for removing it. I think users just need to know the below line: `SUM: Spark sums up metrics from partitions as the final result.`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-785170297


   I think it's too much work to support arbitrary accumulators in `SQLMetrics`. During these years, it seems sufficient to support various metrics (size, time, average, etc.) with simple long values. So I agree with the design here that the v2 metrics API should leverage `SQLMetrics` under the hood.
   
   However, I share the same concern from @zsxwing that using enum to indicate the metrics merging logic is too limited. I don't think it's hard to design an API to merge a bunch of long values.
   
   First, let's look at the task side. The read/write task needs to update the value of executor-side `SQLMetrics`, and Spark will send back the metrics update to the driver via heartbeat event or task complete event. The current PR adds a new method to `PartitionReader` to report the current metrics values, which looks good. Spark needs to call the new method at the end of task (at the end of each epoch for continuous mode), get the metrics value, find the corresponding `SQLMetrics` and update its value. We can polish the API from the current PR a little bit:
   ```
   interface CustomTaskMetrics {
     def name: String
     def value: Long
   }
   
   interface PartitionReader {
     ...
     def currentMetrics: Array[CustomTaskMetrics] = Nil
   }
   ```
   
   Then, let's look at the driver side. The driver side needs to aggregate the task metrics and produce a string that will be displayed in the UI. The current PR adds a new API `CustomMetrics` which is returned by `Scan`. This looks good and we just need to replace the metrics type with a real merge method
   ```
   interface CustomMetrics {
     def name: String
     ...
     def aggregateTaskMetrics(taskMetrics: Array[Long]): String
   }
   ```
   
   Internally, we can add a new `V2_CUSTOM` metrics type in `SQLMetrics`. For this metrics type, we delegate the aggregating logic to the corresponding `CustomMetrics`.
   
   What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r570608230



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Scan.java
##########
@@ -102,4 +102,13 @@ default MicroBatchStream toMicroBatchStream(String checkpointLocation) {
   default ContinuousStream toContinuousStream(String checkpointLocation) {
     throw new UnsupportedOperationException(description() + ": Continuous scan are not supported");
   }
+
+  /**
+   * Returns an array of supported custom metrics with name and description.
+   * By default it returns empty array.
+   */
+  default CustomMetric[] supportedCustomMetrics() {

Review comment:
       I can see the argument for a separate trait, but I would opt not to add this in one. We want to encourage sources to provide metrics. It's easier to see that there's a method that can be implemented than to see an interface that can be mixed in, especially when that interface might extend both `Scan` and `Write` (and not extend from either). There's also just more cost with having a lot of interfaces for very specific things.
   
   I think an optional method is a good balance of concerns: it is optional, but easily discovered and doesn't add another interface that an implementer needs to know about and know which class to implement it in.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804485460


   **[Test build #136369 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136369/testReport)** for PR 31476 at commit [`a35c056`](https://github.com/apache/spark/commit/a35c05684f6c1d27257dc0c9e8b2a6d24666eb16).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r578265717



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomMetric.java
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.read.PartitionReader;
+import org.apache.spark.sql.connector.read.Scan;
+
+/**
+ * A custom metric. This is a logical representation of a metric reported by data sources during
+ * read path. Data sources can report supported metric list by {@link Scan} to Spark in query
+ * planning. During query execution, Spark will collect the metrics per partition by
+ * {@link PartitionReader} and combine metrics from partitions to the final result. How Spark
+ * combines metrics depends on the metric type. For streaming query, Spark will collect and combine
+ * metrics for a final result per micro batch.
+ *
+ * The metrics will be gathered during query execution back to the driver and then combined. The
+ * final result will be shown up in the physical operator in Spark UI.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface CustomMetric {
+    /**
+     * Returns the name of custom metric.
+     */
+    String name();
+
+    /**
+     * Returns the description of custom metric.
+     */
+    String description();
+
+    /**
+     * Supported metric type. The metric types must be supported by Spark SQL internal metrics.

Review comment:
       For now a `CustomMetric` which returns long value should be enough. Previously there is a suggestion to name it just `LongMetric` because it always returns long value. But it sounds too concrete.
   
   So I think is it good to leave some flexibility here? Then I extract `LongMetric` out as a separate interface which returns long value.
   
   I'm fine to just return long in `CustomMetric` (and maybe name it to `LongMetric`).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773632690


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39474/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-778663979


   Thanks for the discussion so far. I've updated the change. Please have another look if you have time. Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r570503897



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/CustomMetric.java
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.read;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A general custom metric.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface CustomMetric {

Review comment:
       The comment https://github.com/apache/spark/pull/31451#discussion_r569774028 suggested to name it `LongMetric`. But later I think if we may need keep flexibility in the base interface.
   
   So I left it as general as possible and add a `LongMetric` which reports a long value.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773564872


   cc @rdblue @Ngone51 @cloud-fan @sunchao @dongjoon-hyun this is separated from #31451 and only includes interface changes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773794508


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134896/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r572300669



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/SupportsReportMetrics.java
##########
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * Data sources can implement this interface to
+ * report supported custom metrics to Spark in read/write path.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface SupportsReportMetrics {
+
+    /**
+     * Returns an array of supported custom metrics with name and description.
+     * By default it returns empty array.
+     */
+    default CustomMetric[] supportedCustomMetrics() {
+        CustomMetric[] NO_METRICS = {};

Review comment:
       shall we define it as a private static field?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zsxwing commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
zsxwing commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r580520711



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomMetric.java
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.read.PartitionReader;
+import org.apache.spark.sql.connector.read.Scan;
+
+/**
+ * A custom metric. This is a logical representation of a metric reported by data sources during
+ * read path. Data sources can report supported metric list by {@link Scan} to Spark in query
+ * planning. During query execution, Spark will collect the metrics per partition by
+ * {@link PartitionReader} and combine metrics from partitions to the final result. How Spark
+ * combines metrics depends on the metric type. For streaming query, Spark will collect and combine
+ * metrics for a final result per micro batch.
+ *
+ * The metrics will be gathered during query execution back to the driver and then combined. The
+ * final result will be shown up in the physical operator in Spark UI.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface CustomMetric {
+    /**
+     * Returns the name of custom metric.
+     */
+    String name();
+
+    /**
+     * Returns the description of custom metric.
+     */
+    String description();
+
+    /**
+     * Supported metric type. The metric types must be supported by Spark SQL internal metrics.
+     * SUM: Spark sums up metrics from partitions as the final result.
+     */
+    enum MetricType {
+      SUM

Review comment:
       I'm wondering if we can design a better API. This would be a public API and hard to change in future. We don't have to build it in a way just because of what SQLMetric can do today. When I added `SQLMetric` to Spark SQL, we didn't pay much attention to this since it's an internal API. But making a new public API is a different story and we need to think about how  to use it from the user perspective. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r580807887



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomMetric.java
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.read.PartitionReader;
+import org.apache.spark.sql.connector.read.Scan;
+
+/**
+ * A custom metric. This is a logical representation of a metric reported by data sources during
+ * read path. Data sources can report supported metric list by {@link Scan} to Spark in query
+ * planning. During query execution, Spark will collect the metrics per partition by
+ * {@link PartitionReader} and combine metrics from partitions to the final result. How Spark
+ * combines metrics depends on the metric type. For streaming query, Spark will collect and combine
+ * metrics for a final result per micro batch.
+ *
+ * The metrics will be gathered during query execution back to the driver and then combined. The
+ * final result will be shown up in the physical operator in Spark UI.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface CustomMetric {
+    /**
+     * Returns the name of custom metric.
+     */
+    String name();
+
+    /**
+     * Returns the description of custom metric.
+     */
+    String description();
+
+    /**
+     * Supported metric type. The metric types must be supported by Spark SQL internal metrics.
+     * SUM: Spark sums up metrics from partitions as the final result.
+     */
+    enum MetricType {
+      SUM

Review comment:
       As you can see, the metric API here is a logical representation of metrics from DS v2. We are not going to re-invent a whole metric API. SQLMetrics are internal to Spark. It is not exposed to end users and data source developers, so I don't think it worries me too much.
   
   I'm not saying that we should not build a good API for DS v2 developers. Seems to me some points in above comments are from end user perspective, I'd like to point out this is for different scenarios.
   
   As this is used for DS v2 purpose, it is for SQL metrics and internally it is converted to SQL metrics. To make SQLMetric support Accmulator and let DS v2 reports Accmulator does not sound bad idea to me. But I'd doubt if it is worth.
   
   One argued point is to define arbitrary combine behavior. Once making SQLMetric support Accmulator, does it mean that we can use arbitrary Accmulator? No, basically SQLMetric allows certain types of metrics, we still need to change SQLMetric to support new metrics.
   
   So the only benefit I thought is to not have another metric API. And I don't think it is serious for this case at the beginning.
    
   This API is pretty simple as it is just logical representation and we only need small change internally to convert collected metrics from DS v2 to SQL metrics.
   
   I just read through the code path to be touched in order to make SQLMetric support Accmulator. Seems it involves more changes not only in DS v2 but maybe also other stuff in sql/core, etc.
   
   Although I doubt if it is worth, I'm open to the suggested Accmulator approach. Let's gather more thoughts from others?
   
   cc @cloud-fan @dongjoon-hyun @rdblue @Ngone51 @sunchao WDYT?
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-803471222


   Hi, @viirya . What is the next step for this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r570513249



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Scan.java
##########
@@ -102,4 +102,13 @@ default MicroBatchStream toMicroBatchStream(String checkpointLocation) {
   default ContinuousStream toContinuousStream(String checkpointLocation) {
     throw new UnsupportedOperationException(description() + ": Continuous scan are not supported");
   }
+
+  /**
+   * Returns an array of supported custom metrics with name and description.
+   * By default it returns empty array.
+   */
+  default CustomMetric[] supportedCustomMetrics() {

Review comment:
       Hmm, I'm okay on putting it into `SupportsReportMetrics`.
   
   From implementation view, I don't see it is much different. Seems we just need to additionally check if a `Scan` is a `SupportsReportMetrics` when we want to get the supported metric list.
   
   Semantically it sounds the same. A scan reports no custom metrics v.s. A scan does not support custom metrics.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804488767


   **[Test build #136370 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136370/testReport)** for PR 31476 at commit [`f46b733`](https://github.com/apache/spark/commit/f46b733c2ec276dad31aa7732ff2349fd4363e52).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773692487


   **[Test build #134896 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134896/testReport)** for PR 31476 at commit [`38fb966`](https://github.com/apache/spark/commit/38fb9667cfcdfc7f103b574e80684380a95d19e4).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r570621632



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Scan.java
##########
@@ -102,4 +102,13 @@ default MicroBatchStream toMicroBatchStream(String checkpointLocation) {
   default ContinuousStream toContinuousStream(String checkpointLocation) {
     throw new UnsupportedOperationException(description() + ": Continuous scan are not supported");
   }
+
+  /**
+   * Returns an array of supported custom metrics with name and description.
+   * By default it returns empty array.
+   */
+  default CustomMetric[] supportedCustomMetrics() {

Review comment:
       I'd prefer an optional method so it sounds good. @dongjoon-hyun @sunchao WDYT?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r580529783



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomMetric.java
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.read.PartitionReader;
+import org.apache.spark.sql.connector.read.Scan;
+
+/**
+ * A custom metric. This is a logical representation of a metric reported by data sources during
+ * read path. Data sources can report supported metric list by {@link Scan} to Spark in query
+ * planning. During query execution, Spark will collect the metrics per partition by
+ * {@link PartitionReader} and combine metrics from partitions to the final result. How Spark
+ * combines metrics depends on the metric type. For streaming query, Spark will collect and combine
+ * metrics for a final result per micro batch.
+ *
+ * The metrics will be gathered during query execution back to the driver and then combined. The
+ * final result will be shown up in the physical operator in Spark UI.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface CustomMetric {
+    /**
+     * Returns the name of custom metric.
+     */
+    String name();
+
+    /**
+     * Returns the description of custom metric.
+     */
+    String description();
+
+    /**
+     * Supported metric type. The metric types must be supported by Spark SQL internal metrics.
+     * SUM: Spark sums up metrics from partitions as the final result.
+     */
+    enum MetricType {
+      SUM

Review comment:
       Hmm, these metrics are not general metrics that could be defined and used by end users. It sounds like we already have Accumulator for the purpose. Defining a public API of general metrics for end users is overlapping with Accumulator, IMHO.
   
   The metrics here are used by DS v2 implementations to report metrics to Spark SQL. That being said, it is not exposed for end users as general metrics. I think the purpose for the metric API is clear.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773721578


   **[Test build #134891 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134891/testReport)** for PR 31476 at commit [`623f193`](https://github.com/apache/spark/commit/623f1932133cb7e4d77ab5bd3f1356e7e7b8da78).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773619143


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39474/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773752119


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39482/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804528556


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40953/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804616183


   **[Test build #136370 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136370/testReport)** for PR 31476 at commit [`f46b733`](https://github.com/apache/spark/commit/f46b733c2ec276dad31aa7732ff2349fd4363e52).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r572298710



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Scan.java
##########
@@ -102,4 +102,13 @@ default MicroBatchStream toMicroBatchStream(String checkpointLocation) {
   default ContinuousStream toContinuousStream(String checkpointLocation) {
     throw new UnsupportedOperationException(description() + ": Continuous scan are not supported");
   }
+
+  /**
+   * Returns an array of supported custom metrics with name and description.
+   * By default it returns empty array.
+   */
+  default CustomMetric[] supportedCustomMetrics() {

Review comment:
       I don't see how a mixin trait helps avoid breaking change. Removing the trait later is also a breaking change. Agree with @rdblue that an optional method is simple and sufficient. Some other features have mixin trait because they don't have a reasonable default if made into optional methods. Metrics is OK as we can return empty metrics as the default.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r580514760



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomMetric.java
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.read.PartitionReader;
+import org.apache.spark.sql.connector.read.Scan;
+
+/**
+ * A custom metric. This is a logical representation of a metric reported by data sources during
+ * read path. Data sources can report supported metric list by {@link Scan} to Spark in query
+ * planning. During query execution, Spark will collect the metrics per partition by
+ * {@link PartitionReader} and combine metrics from partitions to the final result. How Spark
+ * combines metrics depends on the metric type. For streaming query, Spark will collect and combine
+ * metrics for a final result per micro batch.
+ *
+ * The metrics will be gathered during query execution back to the driver and then combined. The
+ * final result will be shown up in the physical operator in Spark UI.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface CustomMetric {
+    /**
+     * Returns the name of custom metric.
+     */
+    String name();
+
+    /**
+     * Returns the description of custom metric.
+     */
+    String description();
+
+    /**
+     * Supported metric type. The metric types must be supported by Spark SQL internal metrics.

Review comment:
       "The metric types must be supported by Spark SQL internal metrics." is more for the developers trying to add new metrics here, instead of using the API. I can remove it if it sounds inappropriate.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r570573185



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Scan.java
##########
@@ -102,4 +102,13 @@ default MicroBatchStream toMicroBatchStream(String checkpointLocation) {
   default ContinuousStream toContinuousStream(String checkpointLocation) {
     throw new UnsupportedOperationException(description() + ": Continuous scan are not supported");
   }
+
+  /**
+   * Returns an array of supported custom metrics with name and description.
+   * By default it returns empty array.
+   */
+  default CustomMetric[] supportedCustomMetrics() {

Review comment:
       I also thought like @sunchao . +1 for a new trait.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r570503897



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/CustomMetric.java
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.read;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A general custom metric.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface CustomMetric {

Review comment:
       The comment https://github.com/apache/spark/pull/31451#discussion_r569774028 suggested to name it `LongMetric`. But later I think if we may need keep flexibility in the base interface.
   
   So I left it as general as possible and add a `LongMetric` which reports a long value.

##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/CustomMetric.java
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.read;

Review comment:
       Oh, yea, @rdblue suggested we can add to writes. Let me change the package. Thanks.

##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/LongMetric.java
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.read;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom metric that reports a long value.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface LongMetric extends CustomMetric {
+  /**
+   * Returns the name of custom metric.
+   */
+  String name();

Review comment:
       Oops, yea, no need to.

##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Scan.java
##########
@@ -102,4 +102,13 @@ default MicroBatchStream toMicroBatchStream(String checkpointLocation) {
   default ContinuousStream toContinuousStream(String checkpointLocation) {
     throw new UnsupportedOperationException(description() + ": Continuous scan are not supported");
   }
+
+  /**
+   * Returns an array of supported custom metrics with name and description.
+   * By default it returns empty array.
+   */
+  default CustomMetric[] supportedCustomMetrics() {

Review comment:
       Hmm, I'm okay on putting it into `SupportsReportMetrics`.
   
   From implementation view, I don't see it is much different. Seems we just need to additionally check if a `Scan` is a `SupportsReportMetrics` when we want to get the supported metric list.
   
   Semantically it sounds the same. A scan reports no custom metrics v.s. A scan does not support custom metrics.

##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/PartitionReader.java
##########
@@ -48,4 +49,12 @@
    * Return the current record. This method should return same value until `next` is called.
    */
   T get();
+
+  /**
+   * Returns an array of custom metrics. By default it returns empty array.
+   */
+  default CustomMetric[] getCustomMetrics() {

Review comment:
       I see. I will make it as a trait.

##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Scan.java
##########
@@ -102,4 +102,13 @@ default MicroBatchStream toMicroBatchStream(String checkpointLocation) {
   default ContinuousStream toContinuousStream(String checkpointLocation) {
     throw new UnsupportedOperationException(description() + ": Continuous scan are not supported");
   }
+
+  /**
+   * Returns an array of supported custom metrics with name and description.
+   * By default it returns empty array.
+   */
+  default CustomMetric[] supportedCustomMetrics() {

Review comment:
       I'd prefer an optional method so it sounds good. @dongjoon-hyun @sunchao WDYT?

##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomMetric.java
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom metric.

Review comment:
       I will add more information to the comment.
   
   Basically I will add a few metric classes based on `CustomMetric`. They are correspond to sum, size, timing metrics in `SQLMetric`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r570625284



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Scan.java
##########
@@ -102,4 +102,13 @@ default MicroBatchStream toMicroBatchStream(String checkpointLocation) {
   default ContinuousStream toContinuousStream(String checkpointLocation) {
     throw new UnsupportedOperationException(description() + ": Continuous scan are not supported");
   }
+
+  /**
+   * Returns an array of supported custom metrics with name and description.
+   * By default it returns empty array.
+   */
+  default CustomMetric[] supportedCustomMetrics() {

Review comment:
       it's a nit from me so I'm fine either way. About the method naming, should we make it shorter like `metrics`? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r572265001



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Scan.java
##########
@@ -102,4 +102,13 @@ default MicroBatchStream toMicroBatchStream(String checkpointLocation) {
   default ContinuousStream toContinuousStream(String checkpointLocation) {
     throw new UnsupportedOperationException(description() + ": Continuous scan are not supported");
   }
+
+  /**
+   * Returns an array of supported custom metrics with name and description.
+   * By default it returns empty array.
+   */
+  default CustomMetric[] supportedCustomMetrics() {

Review comment:
       Thank you @rdblue.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r572414083



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Scan.java
##########
@@ -102,4 +102,13 @@ default MicroBatchStream toMicroBatchStream(String checkpointLocation) {
   default ContinuousStream toContinuousStream(String checkpointLocation) {
     throw new UnsupportedOperationException(description() + ": Continuous scan are not supported");
   }
+
+  /**
+   * Returns an array of supported custom metrics with name and description.
+   * By default it returns empty array.
+   */
+  default CustomMetric[] supportedCustomMetrics() {

Review comment:
       cc @dongjoon-hyun Does it convincing for you?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r570608230



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Scan.java
##########
@@ -102,4 +102,13 @@ default MicroBatchStream toMicroBatchStream(String checkpointLocation) {
   default ContinuousStream toContinuousStream(String checkpointLocation) {
     throw new UnsupportedOperationException(description() + ": Continuous scan are not supported");
   }
+
+  /**
+   * Returns an array of supported custom metrics with name and description.
+   * By default it returns empty array.
+   */
+  default CustomMetric[] supportedCustomMetrics() {

Review comment:
       I can see the argument for a separate trait, but I would opt not to add this in one. We want to encourage sources to provide metrics. It's easier to see that there's a method that can be implemented than to see an interface that can be mixed in, especially when that interface might extend both `Scan` and `Write` (and not extend from either). There's also just more cost with having a lot of interfaces for very specific things.
   
   I think an optional method is a good balance of concerns: it is optional, but easily discovered and doesn't add another interface that an implementer needs to know about and know which class to implement it in.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r571537663



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Scan.java
##########
@@ -102,4 +102,13 @@ default MicroBatchStream toMicroBatchStream(String checkpointLocation) {
   default ContinuousStream toContinuousStream(String checkpointLocation) {
     throw new UnsupportedOperationException(description() + ": Continuous scan are not supported");
   }
+
+  /**
+   * Returns an array of supported custom metrics with name and description.
+   * By default it returns empty array.
+   */
+  default CustomMetric[] supportedCustomMetrics() {

Review comment:
       I'm fine for a separate trait. It is great if we can have a consensus so this can move forward. @rdblue Please let me know if you have strong option here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #31476:
URL: https://github.com/apache/spark/pull/31476


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zsxwing commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
zsxwing commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r580756025



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomMetric.java
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.read.PartitionReader;
+import org.apache.spark.sql.connector.read.Scan;
+
+/**
+ * A custom metric. This is a logical representation of a metric reported by data sources during
+ * read path. Data sources can report supported metric list by {@link Scan} to Spark in query
+ * planning. During query execution, Spark will collect the metrics per partition by
+ * {@link PartitionReader} and combine metrics from partitions to the final result. How Spark
+ * combines metrics depends on the metric type. For streaming query, Spark will collect and combine
+ * metrics for a final result per micro batch.
+ *
+ * The metrics will be gathered during query execution back to the driver and then combined. The
+ * final result will be shown up in the physical operator in Spark UI.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface CustomMetric {
+    /**
+     * Returns the name of custom metric.
+     */
+    String name();
+
+    /**
+     * Returns the description of custom metric.
+     */
+    String description();
+
+    /**
+     * Supported metric type. The metric types must be supported by Spark SQL internal metrics.
+     * SUM: Spark sums up metrics from partitions as the final result.
+     */
+    enum MetricType {
+      SUM

Review comment:
       > Defining a public API of general metrics for end users is overlapping with Accumulator, IMHO.
   
   Totally agree. There would be 3 types of metrics APIs (including the internal SQLMetric) if we added this. That's really confusing to users. Is it possible to make SQLMetric support general Accmulator instead so that we don't need to re-invent a new metric API?
   
   > That being said, it is not exposed for end users as general metrics.
   
   Developers building spark data sources are also critical to the whole Spark ecosystem and these APIs are designed for them. IMHO, we should try our best to build a better API if possible.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r580513620



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomMetric.java
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.read.PartitionReader;
+import org.apache.spark.sql.connector.read.Scan;
+
+/**
+ * A custom metric. This is a logical representation of a metric reported by data sources during
+ * read path. Data sources can report supported metric list by {@link Scan} to Spark in query
+ * planning. During query execution, Spark will collect the metrics per partition by
+ * {@link PartitionReader} and combine metrics from partitions to the final result. How Spark
+ * combines metrics depends on the metric type. For streaming query, Spark will collect and combine
+ * metrics for a final result per micro batch.
+ *
+ * The metrics will be gathered during query execution back to the driver and then combined. The
+ * final result will be shown up in the physical operator in Spark UI.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface CustomMetric {
+    /**
+     * Returns the name of custom metric.
+     */
+    String name();
+
+    /**
+     * Returns the description of custom metric.
+     */
+    String description();
+
+    /**
+     * Supported metric type. The metric types must be supported by Spark SQL internal metrics.
+     * SUM: Spark sums up metrics from partitions as the final result.
+     */
+    enum MetricType {
+      SUM

Review comment:
       Because these metrics are logical representation and are collected to be SQL metrics internally. They are used by DS v2 to report SQL metrics easily without dealing with internal `SQLMetric`. So SQL metrics define how metrics are combined/aggregated. A similar case is public expression APIs for predicate pushdown, as they are converted from catalyst expressions so are matched to catalyst expressions.
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-778673212


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39722/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773579295


   **[Test build #134886 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134886/testReport)** for PR 31476 at commit [`a0f738d`](https://github.com/apache/spark/commit/a0f738d639376f0981cd2edb4bb1d931150c6d93).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773564872


   cc @rdblue @Ngone51 @cloud-fan @sunchao @dongjoon-hyun this is separated from #31451 and only includes interface changes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-785558137


   I share the direction to have custom merging behavior. So that being said, I agree with the polished idea from the API perspective.
   
   I don't have question about the task side. It is basically the same except for the enum metric type.
   
   My only question is how to delegate the aggregating logic to the aggregate method. As we cannot access `CustomMetrics` during processing execution live information. I think we need to change `SQLMetrics` to include the aggregate method if it is `V2_CUSTOM` metrics type, e.g.
   
   ```scala
   class SQLMetric(val metricType: String, initValue: Long = 0L, customAggregateMethod: Option[Array[Long] -> String]) extends AccumulatorV2[Long, Long] {
     ...
   }
   ```
   
   When we aggregate the metrics, for `V2_CUSTOM` metric type, we ask original `SQLMetric` from `AccumulatorContext` and call the aggregate method.
   
   Does it make sense?
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804620823






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773692487


   **[Test build #134896 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134896/testReport)** for PR 31476 at commit [`38fb966`](https://github.com/apache/spark/commit/38fb9667cfcdfc7f103b574e80684380a95d19e4).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773632690


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39474/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r570572005



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomMetric.java
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A general custom metric.

Review comment:
       Shall we remove `general`?

##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Scan.java
##########
@@ -102,4 +102,13 @@ default MicroBatchStream toMicroBatchStream(String checkpointLocation) {
   default ContinuousStream toContinuousStream(String checkpointLocation) {
     throw new UnsupportedOperationException(description() + ": Continuous scan are not supported");
   }
+
+  /**
+   * Returns an array of supported custom metrics with name and description.
+   * By default it returns empty array.
+   */
+  default CustomMetric[] supportedCustomMetrics() {

Review comment:
       I also thought like @sunchao . +1 for a new trait.

##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/PartitionReader.java
##########
@@ -48,4 +49,12 @@
    * Return the current record. This method should return same value until `next` is called.
    */
   T get();
+
+  /**
+   * Returns an array of custom metrics. By default it returns empty array.
+   */
+  default CustomMetric[] getCustomMetrics() {

Review comment:
       Also, can we split this as a trait, too?

##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/SupportsReportMetrics.java
##########
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * Data sources can implement this interface to
+ * report supported custom metrics to Spark in read/write path.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface SupportsReportMetrics {
+
+    /**
+     * Returns an array of supported custom metrics with name and description.
+     * By default it returns empty array.
+     */
+    default CustomMetric[] supportedCustomMetrics() {

Review comment:
       2-space Indentation?

##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/PartitionReaderWithMetrics.java
##########
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.read;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.CustomMetric;
+
+/**
+ * A mix in interface for {@link PartitionReader}. Partition reader can this interface to
+ * report custom metrics to Spark.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface PartitionReaderWithMetrics<T> extends PartitionReader<T> {
+
+    /**
+     * Returns an array of custom metrics. By default it returns empty array.
+     */
+    default CustomMetric[] getCustomMetrics() {

Review comment:
       2-space Indentation?

##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/PartitionReaderWithMetrics.java
##########
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.read;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.CustomMetric;
+
+/**
+ * A mix in interface for {@link PartitionReader}. Partition reader can this interface to

Review comment:
       `can this` -> `can implement this`

##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Scan.java
##########
@@ -102,4 +102,13 @@ default MicroBatchStream toMicroBatchStream(String checkpointLocation) {
   default ContinuousStream toContinuousStream(String checkpointLocation) {
     throw new UnsupportedOperationException(description() + ": Continuous scan are not supported");
   }
+
+  /**
+   * Returns an array of supported custom metrics with name and description.
+   * By default it returns empty array.
+   */
+  default CustomMetric[] supportedCustomMetrics() {

Review comment:
       Encourage is different from enforcing.
   >  We want to encourage sources to provide metrics.
   
   In addition, if we tightly couple this together, we are unable to split it permanently without a breaking change. That's a design risk to me.
   > There's also just more cost with having a lot of interfaces for very specific things.
   
   Given that, I'd recommend to put orthogonal concept separately.
   
   Since this is a design issue, cc @cloud-fan , @gatorsmile , @gengliangwang , too.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r570572005



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomMetric.java
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A general custom metric.

Review comment:
       Shall we remove `general`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773711985


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134886/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-785882143


   Yes, `SQLMetrics` is the one to define the aggregating logic. Previously it's decided by `metricsType`, now we can probably replace `metricsType` with a general aggregating function: `aggregateMethod: (Array[Long], Array[Long]) => String`. For builtin metrics it's `SQLMetrics.stringValue(metricsType, _, _)`. For v2 metrics we ignore the second parameter. Seems we don't need `V2_CUSTOM` :)
    
   It's the SQL UI component that collects the task metrics and aggregates them, so we should let the SQL UI component knows the custom aggregating logic. We can propagate it through `SQLMetricInfo`, `SQLPlanMetric`, etc.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r570507446



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/LongMetric.java
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.read;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom metric that reports a long value.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface LongMetric extends CustomMetric {
+  /**
+   * Returns the name of custom metric.
+   */
+  String name();

Review comment:
       Oops, yea, no need to.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-778675245


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39722/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-783335638


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-786372277


   Let's make this PR API only, as we know it's implementable. We can discuss the implementation details in the followup PR.
   
   I'd avoid using `AccumulatorContext` as it uses weak references and is not robust.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773579295


   **[Test build #134886 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134886/testReport)** for PR 31476 at commit [`a0f738d`](https://github.com/apache/spark/commit/a0f738d639376f0981cd2edb4bb1d931150c6d93).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773793576


   **[Test build #134896 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134896/testReport)** for PR 31476 at commit [`38fb966`](https://github.com/apache/spark/commit/38fb9667cfcdfc7f103b574e80684380a95d19e4).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773665448


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39478/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804985286


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136396/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804614005


   **[Test build #136369 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136369/testReport)** for PR 31476 at commit [`a35c056`](https://github.com/apache/spark/commit/a35c05684f6c1d27257dc0c9e8b2a6d24666eb16).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan edited a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
cloud-fan edited a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-785170297


   I think it's too much work to support arbitrary accumulators in `SQLMetrics`. During these years, it seems sufficient to support various metrics (size, time, average, etc.) with simple long values. So I agree with the design here that the v2 metrics API should leverage `SQLMetrics` under the hood.
   
   However, I share the same concern from @zsxwing that using enum to indicate the metrics merging logic is too limited. I don't think it's hard to design an API to merge a bunch of long values.
   
   First, let's look at the task side. The read/write task needs to update the value of executor-side `SQLMetrics`, and Spark will send back the metrics update to the driver via heartbeat event or task complete event. The current PR adds a new method to `PartitionReader` to report the current metrics values, which looks good. Spark needs to call the new method at the end of task (at the end of each epoch for continuous mode), get the metrics value, find the corresponding `SQLMetrics` and update its value. We can polish the API from the current PR a little bit:
   ```
   interface CustomTaskMetrics {
     def name: String
     def value: Long
   }
   
   interface PartitionReader {
     ...
     def currentMetricsValues: Array[CustomTaskMetrics] = Nil
   }
   ```
   
   Then, let's look at the driver side. The driver side needs to aggregate the task metrics and produce a string that will be displayed in the UI. The current PR adds a new API `CustomMetrics` which is returned by `Scan`. This looks good and we just need to replace the metrics type with a real merge method
   ```
   interface CustomMetrics {
     def name: String
     ...
     def initialValue: Long = 0
     def aggregateTaskMetrics(taskMetrics: Array[Long]): String
   }
   ```
   
   Internally, we can add a new `V2_CUSTOM` metrics type in `SQLMetrics`. For this metrics type, we delegate the aggregating logic to the corresponding `CustomMetrics`.
   
   What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804742964


   **[Test build #136396 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136396/testReport)** for PR 31476 at commit [`eb9d94a`](https://github.com/apache/spark/commit/eb9d94a07a38d6efcd71f12cc1c2db22bf8a7019).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773684368


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39478/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r575701986



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomMetric.java
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.read.PartitionReader;
+import org.apache.spark.sql.connector.read.Scan;
+
+/**
+ * A custom metric. This is a logical representation of a metric reported by data sources during
+ * read path. Data sources can report supported metric list by {@link Scan} to Spark in query
+ * planning. During query execution, Spark will collect the metrics per partition by
+ * {@link PartitionReader} and combine metrics from partitions to the final result. How Spark
+ * combines metrics depends on the metric type. For streaming query, Spark will collect and combine
+ * metrics for a final result per micro batch.
+ *
+ * The metrics will be gathered during query execution back to the driver and then combined. The
+ * final result will be shown up in the physical operator in Spark UI.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface CustomMetric {
+    /**
+     * Returns the name of custom metric.
+     */
+    String name();
+
+    /**
+     * Returns the description of custom metric.
+     */
+    String description();
+
+    /**
+     * Supported metric type. The metric types must be supported by Spark SQL internal metrics.
+     * SUM: Spark sums up metrics from partitions as the final result.
+     */
+    enum MetricType {
+      SUM

Review comment:
       I only use sum metric for now for the Kafka scan purpose. So I leave other possible metric type (size, timing, average) out for now to make it simpler at the beginning.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804985286


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136396/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-778700341


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135141/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r570650477



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Scan.java
##########
@@ -102,4 +102,13 @@ default MicroBatchStream toMicroBatchStream(String checkpointLocation) {
   default ContinuousStream toContinuousStream(String checkpointLocation) {
     throw new UnsupportedOperationException(description() + ": Continuous scan are not supported");
   }
+
+  /**
+   * Returns an array of supported custom metrics with name and description.
+   * By default it returns empty array.
+   */
+  default CustomMetric[] supportedCustomMetrics() {

Review comment:
       Encourage is different from enforcing.
   >  We want to encourage sources to provide metrics.
   
   In addition, if we tightly couple this together, we are unable to split it permanently without a breaking change. That's a design risk to me.
   > There's also just more cost with having a lot of interfaces for very specific things.
   
   Given that, I'd recommend to put orthogonal concept separately.
   
   Since this is a design issue, cc @cloud-fan , @gatorsmile , @gengliangwang , too.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804890588


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40979/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r570507222



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/CustomMetric.java
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.read;

Review comment:
       Oh, yea, @rdblue suggested we can add to writes. Let me change the package. Thanks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773684368


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39478/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r570695900



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomMetric.java
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom metric.

Review comment:
       I will add more information to the comment.
   
   Basically I will add a few metric classes based on `CustomMetric`. They are correspond to sum, size, timing metrics in `SQLMetric`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-778675245


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39722/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r578898016



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomMetric.java
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.read.PartitionReader;
+import org.apache.spark.sql.connector.read.Scan;
+
+/**
+ * A custom metric. This is a logical representation of a metric reported by data sources during
+ * read path. Data sources can report supported metric list by {@link Scan} to Spark in query
+ * planning. During query execution, Spark will collect the metrics per partition by
+ * {@link PartitionReader} and combine metrics from partitions to the final result. How Spark
+ * combines metrics depends on the metric type. For streaming query, Spark will collect and combine
+ * metrics for a final result per micro batch.
+ *
+ * The metrics will be gathered during query execution back to the driver and then combined. The
+ * final result will be shown up in the physical operator in Spark UI.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface CustomMetric {
+    /**
+     * Returns the name of custom metric.
+     */
+    String name();
+
+    /**
+     * Returns the description of custom metric.
+     */
+    String description();
+
+    /**
+     * Supported metric type. The metric types must be supported by Spark SQL internal metrics.

Review comment:
       Okay, let me move it back. See if others have different thoughts.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-786112772


   I think we have a couple choices.
   
   * When we aggregate the metrics, for `V2_CUSTOM` metric type, we access the `SQLMetric` from `AccumulatorContext` and call its aggregate method. For built-in metrics, just call `SQLMetrics.stringValue` as usual.
   * Replace `metricsType` with a general aggregating function. We propagate it through `SQLMetricInfo`, `SQLPlanMetric`, etc. In SQL UI component, we call the aggregating.
   
   But these are implementation details, I think the choice doesn't affect the public API here.
   
   Do you prefer to make the choice here and put it into same PR, or we polish the public API only in this PR and then create another PR for implementation?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r575695979



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/SupportsReportMetrics.java
##########
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * Data sources can implement this interface to
+ * report supported custom metrics to Spark in read/write path.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface SupportsReportMetrics {
+
+    /**
+     * Returns an array of supported custom metrics with name and description.
+     * By default it returns empty array.
+     */
+    default CustomMetric[] supportedCustomMetrics() {
+        CustomMetric[] NO_METRICS = {};

Review comment:
       Unfortunately, Java interface cannot define private or protected field.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773729738


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134891/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773676270


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39478/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773579295






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-781801426


   **[Test build #135252 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135252/testReport)** for PR 31476 at commit [`cce58a5`](https://github.com/apache/spark/commit/cce58a56d5c48cb351688411d65159779550b5de).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-778673685


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39722/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804533230


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40954/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773632690






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773605675


   **[Test build #134891 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134891/testReport)** for PR 31476 at commit [`623f193`](https://github.com/apache/spark/commit/623f1932133cb7e4d77ab5bd3f1356e7e7b8da78).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773711985


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134886/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773604703


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39474/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan edited a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
cloud-fan edited a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-786372277


   Let's make this PR API only, as we know it's implementable. We can discuss the implementation details in the followup PR.
   
   I'd avoid using `AccumulatorContext` as it uses weak references and is not robust. That's probably why we propagate the metrics type through UI objects, not `AccumulatorContext`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zsxwing commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
zsxwing commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r570666630



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomMetric.java
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom metric.

Review comment:
       Since this is a public API, it would be great to add more information to explain how to use APIs as a developer. For example,
   
   - If I define my own `CustomMetric`, how does Spark use it?
   - If I define a metric type that doesn't support `sum`, for example, measure the executor jvm heap size, how does Spark handle it?
   - If my `PartitionReaderWithMetrics` returns metrics for a partition, will Spark combine them for partitions of a Spark job?
   - In the streaming case, how does Spark combine metrics from different micro batches?
   - If I would like to report metrics in driver, how do I do it?
   
   I feel you might need to build an interface similar to the private `SQLMetric`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773752119


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39482/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r580513620



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomMetric.java
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.read.PartitionReader;
+import org.apache.spark.sql.connector.read.Scan;
+
+/**
+ * A custom metric. This is a logical representation of a metric reported by data sources during
+ * read path. Data sources can report supported metric list by {@link Scan} to Spark in query
+ * planning. During query execution, Spark will collect the metrics per partition by
+ * {@link PartitionReader} and combine metrics from partitions to the final result. How Spark
+ * combines metrics depends on the metric type. For streaming query, Spark will collect and combine
+ * metrics for a final result per micro batch.
+ *
+ * The metrics will be gathered during query execution back to the driver and then combined. The
+ * final result will be shown up in the physical operator in Spark UI.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface CustomMetric {
+    /**
+     * Returns the name of custom metric.
+     */
+    String name();
+
+    /**
+     * Returns the description of custom metric.
+     */
+    String description();
+
+    /**
+     * Supported metric type. The metric types must be supported by Spark SQL internal metrics.
+     * SUM: Spark sums up metrics from partitions as the final result.
+     */
+    enum MetricType {
+      SUM

Review comment:
       Because these metrics are logical representation and are collected to be SQL metrics internally. They are used by DS v2 to report SQL metrics easily. So SQL metrics define how metrics are combined/aggregated. A similar case is public expression APIs for predicate pushdown, as they are converted from catalyst expressions so are matched to catalyst expressions.
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773579295






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r572270360



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Scan.java
##########
@@ -102,4 +102,13 @@ default MicroBatchStream toMicroBatchStream(String checkpointLocation) {
   default ContinuousStream toContinuousStream(String checkpointLocation) {
     throw new UnsupportedOperationException(description() + ": Continuous scan are not supported");
   }
+
+  /**
+   * Returns an array of supported custom metrics with name and description.
+   * By default it returns empty array.
+   */
+  default CustomMetric[] supportedCustomMetrics() {

Review comment:
       No problem! I didn't intend to delay this, I just thought that other people might agree with my reasoning.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804620823






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r574026111



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/SupportsReportMetrics.java
##########
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * Data sources can implement this interface to
+ * report supported custom metrics to Spark in read/write path.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface SupportsReportMetrics {
+
+    /**
+     * Returns an array of supported custom metrics with name and description.
+     * By default it returns empty array.
+     */
+    default CustomMetric[] supportedCustomMetrics() {
+        CustomMetric[] NO_METRICS = {};

Review comment:
       +1 for @cloud-fan 's suggestion to declare static field `NO_METRICS`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773729738


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134891/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-778669917


   **[Test build #135141 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135141/testReport)** for PR 31476 at commit [`b8c762a`](https://github.com/apache/spark/commit/b8c762a776d535a1083c424a61922dc8d1401ccd).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-778697579


   **[Test build #135141 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135141/testReport)** for PR 31476 at commit [`b8c762a`](https://github.com/apache/spark/commit/b8c762a776d535a1083c424a61922dc8d1401ccd).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r574009954



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Scan.java
##########
@@ -102,4 +102,13 @@ default MicroBatchStream toMicroBatchStream(String checkpointLocation) {
   default ContinuousStream toContinuousStream(String checkpointLocation) {
     throw new UnsupportedOperationException(description() + ": Continuous scan are not supported");
   }
+
+  /**
+   * Returns an array of supported custom metrics with name and description.
+   * By default it returns empty array.
+   */
+  default CustomMetric[] supportedCustomMetrics() {

Review comment:
       @cloud-fan Removing `trait` is a little different story. The point was we don't need to change `Scan` interface like this every time. For the use cases which only depends on `Scan` interface, we can guarantee that nothing is broken.
   > I don't see how a mixin trait helps avoid breaking change. Removing the trait later is also a breaking change.
   
   However, I also don't want to block this PR like this for a long time. So, I switch to give +1 for @rdblue's suggestion.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r599346059



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomTaskMetric.java
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.read.PartitionReader;
+
+/**
+ * A custom task metric. This is a logical representation of a metric reported by data sources
+ * at the executor side. During query execution, Spark will collect the task metrics per partition
+ * by {@link PartitionReader} and update internal metrics based on collected metric values.
+ * For streaming query, Spark will collect and combine metrics for a final result per micro batch.
+ *
+ * The metrics will be gathered during query execution back to the driver and then combined. How
+ * the task metrics are combined is defined by corresponding {@link CustomMetric} with same metric
+ * name. The final result will be shown up in the physical operator in Spark UI.

Review comment:
       `physical operator` -> `data source scan operator`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-783184108


   @cloud-fan @rdblue @Ngone51 @sunchao @zsxwing If you have more comments, please let me know. Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804506784


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40953/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804912608


   Thank you, @viirya and all! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zsxwing commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
zsxwing commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r580486702



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomMetric.java
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.read.PartitionReader;
+import org.apache.spark.sql.connector.read.Scan;
+
+/**
+ * A custom metric. This is a logical representation of a metric reported by data sources during
+ * read path. Data sources can report supported metric list by {@link Scan} to Spark in query
+ * planning. During query execution, Spark will collect the metrics per partition by
+ * {@link PartitionReader} and combine metrics from partitions to the final result. How Spark
+ * combines metrics depends on the metric type. For streaming query, Spark will collect and combine
+ * metrics for a final result per micro batch.
+ *
+ * The metrics will be gathered during query execution back to the driver and then combined. The
+ * final result will be shown up in the physical operator in Spark UI.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface CustomMetric {
+    /**
+     * Returns the name of custom metric.
+     */
+    String name();
+
+    /**
+     * Returns the description of custom metric.
+     */
+    String description();
+
+    /**
+     * Supported metric type. The metric types must be supported by Spark SQL internal metrics.
+     * SUM: Spark sums up metrics from partitions as the final result.
+     */
+    enum MetricType {
+      SUM

Review comment:
       By the way, regarding the naming, I would prefer `MergeType` to make it clear if we did go to this direction.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804488767


   **[Test build #136370 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136370/testReport)** for PR 31476 at commit [`f46b733`](https://github.com/apache/spark/commit/f46b733c2ec276dad31aa7732ff2349fd4363e52).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r570574172



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/PartitionReader.java
##########
@@ -48,4 +49,12 @@
    * Return the current record. This method should return same value until `next` is called.
    */
   T get();
+
+  /**
+   * Returns an array of custom metrics. By default it returns empty array.
+   */
+  default CustomMetric[] getCustomMetrics() {

Review comment:
       Also, can we split this as a trait, too?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-781915871


   **[Test build #135252 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135252/testReport)** for PR 31476 at commit [`cce58a5`](https://github.com/apache/spark/commit/cce58a56d5c48cb351688411d65159779550b5de).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804983017


   **[Test build #136396 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136396/testReport)** for PR 31476 at commit [`eb9d94a`](https://github.com/apache/spark/commit/eb9d94a07a38d6efcd71f12cc1c2db22bf8a7019).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773605675


   **[Test build #134891 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134891/testReport)** for PR 31476 at commit [`623f193`](https://github.com/apache/spark/commit/623f1932133cb7e4d77ab5bd3f1356e7e7b8da78).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zsxwing commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
zsxwing commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r580485941



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomMetric.java
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.read.PartitionReader;
+import org.apache.spark.sql.connector.read.Scan;
+
+/**
+ * A custom metric. This is a logical representation of a metric reported by data sources during
+ * read path. Data sources can report supported metric list by {@link Scan} to Spark in query
+ * planning. During query execution, Spark will collect the metrics per partition by
+ * {@link PartitionReader} and combine metrics from partitions to the final result. How Spark
+ * combines metrics depends on the metric type. For streaming query, Spark will collect and combine
+ * metrics for a final result per micro batch.
+ *
+ * The metrics will be gathered during query execution back to the driver and then combined. The
+ * final result will be shown up in the physical operator in Spark UI.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface CustomMetric {
+    /**
+     * Returns the name of custom metric.
+     */
+    String name();
+
+    /**
+     * Returns the description of custom metric.
+     */
+    String description();
+
+    /**
+     * Supported metric type. The metric types must be supported by Spark SQL internal metrics.
+     * SUM: Spark sums up metrics from partitions as the final result.
+     */
+    enum MetricType {
+      SUM

Review comment:
       Any reason why we do not allow users to define the combine method instead? The current API means if a user needs to use a different combine behavior, they need to submit a PR to Spark to add it and wait for a new Spark release.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan edited a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
cloud-fan edited a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-785170297


   I think it's too much work to support arbitrary accumulators in `SQLMetrics`. During these years, it seems sufficient to support various metrics (size, time, average, etc.) with simple long values. So I agree with the design here that the v2 metrics API should leverage `SQLMetrics` under the hood.
   
   However, I share the same concern from @zsxwing that using enum to indicate the metrics merging logic is too limited. I don't think it's hard to design an API to merge a bunch of long values.
   
   First, let's look at the task side. The read/write task needs to update the value of executor-side `SQLMetrics`, and Spark will send back the metrics update to the driver via heartbeat event or task complete event. The current PR adds a new method to `PartitionReader` to report the current metrics values, which looks good. Spark needs to call the new method at the end of task (at the end of each epoch for continuous mode), get the metrics value, find the corresponding `SQLMetrics` and update its value. We can polish the API from the current PR a little bit:
   ```
   interface CustomTaskMetrics {
     def name: String
     def value: Long
   }
   
   interface PartitionReader {
     ...
     def currentMetricsValues: Array[CustomTaskMetrics] = Nil
   }
   ```
   
   Then, let's look at the driver side. The driver side needs to aggregate the task metrics and produce a string that will be displayed in the UI. The current PR adds a new API `CustomMetrics` which is returned by `Scan`. This looks good and we just need to replace the enum `MetricsType` with a real aggregate method
   ```
   interface CustomMetrics {
     def name: String
     ...
     def initialValue: Long = 0
     def aggregateTaskMetrics(taskMetrics: Array[Long]): String
   }
   ```
   
   Internally, we can add a new `V2_CUSTOM` metrics type in `SQLMetrics`. For this metrics type, we delegate the aggregating logic to the corresponding `CustomMetrics`.
   
   What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773738285


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39482/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-773727733


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39482/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-781848150


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39832/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804515859


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40954/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-781937240


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135252/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-781848150


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39832/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r572241692



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Scan.java
##########
@@ -102,4 +102,13 @@ default MicroBatchStream toMicroBatchStream(String checkpointLocation) {
   default ContinuousStream toContinuousStream(String checkpointLocation) {
     throw new UnsupportedOperationException(description() + ": Continuous scan are not supported");
   }
+
+  /**
+   * Returns an array of supported custom metrics with name and description.
+   * By default it returns empty array.
+   */
+  default CustomMetric[] supportedCustomMetrics() {

Review comment:
       I'm okay with a separate trait. I wouldn't want to block this on a minor issue, but I don't think it is the right choice. If the general consensus is to go with a trait, please more forward.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya edited a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya edited a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-803471636


   @dongjoon-hyun Sorry for late. I will update this based on above discussion with @cloud-fan and @zsxwing. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r570648003



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/SupportsReportMetrics.java
##########
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * Data sources can implement this interface to
+ * report supported custom metrics to Spark in read/write path.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface SupportsReportMetrics {
+
+    /**
+     * Returns an array of supported custom metrics with name and description.
+     * By default it returns empty array.
+     */
+    default CustomMetric[] supportedCustomMetrics() {

Review comment:
       2-space Indentation?

##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/PartitionReaderWithMetrics.java
##########
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.read;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.CustomMetric;
+
+/**
+ * A mix in interface for {@link PartitionReader}. Partition reader can this interface to
+ * report custom metrics to Spark.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface PartitionReaderWithMetrics<T> extends PartitionReader<T> {
+
+    /**
+     * Returns an array of custom metrics. By default it returns empty array.
+     */
+    default CustomMetric[] getCustomMetrics() {

Review comment:
       2-space Indentation?

##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/PartitionReaderWithMetrics.java
##########
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.read;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.CustomMetric;
+
+/**
+ * A mix in interface for {@link PartitionReader}. Partition reader can this interface to

Review comment:
       `can this` -> `can implement this`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804832004


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40979/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-803471636


   @dongjoon-hyun I will update this based on above discussion with @cloud-fan and @zsxwing. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r573977591



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Scan.java
##########
@@ -102,4 +102,13 @@ default MicroBatchStream toMicroBatchStream(String checkpointLocation) {
   default ContinuousStream toContinuousStream(String checkpointLocation) {
     throw new UnsupportedOperationException(description() + ": Continuous scan are not supported");
   }
+
+  /**
+   * Returns an array of supported custom metrics with name and description.
+   * By default it returns empty array.
+   */
+  default CustomMetric[] supportedCustomMetrics() {

Review comment:
       I confirmed with @dongjoon-hyun offline and it's okay to move towards optional methods. Thanks all for the discussion.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-778669917


   **[Test build #135141 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135141/testReport)** for PR 31476 at commit [`b8c762a`](https://github.com/apache/spark/commit/b8c762a776d535a1083c424a61922dc8d1401ccd).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r570579980



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/PartitionReader.java
##########
@@ -48,4 +49,12 @@
    * Return the current record. This method should return same value until `next` is called.
    */
   T get();
+
+  /**
+   * Returns an array of custom metrics. By default it returns empty array.
+   */
+  default CustomMetric[] getCustomMetrics() {

Review comment:
       I see. I will make it as a trait.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r580807887



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/CustomMetric.java
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.read.PartitionReader;
+import org.apache.spark.sql.connector.read.Scan;
+
+/**
+ * A custom metric. This is a logical representation of a metric reported by data sources during
+ * read path. Data sources can report supported metric list by {@link Scan} to Spark in query
+ * planning. During query execution, Spark will collect the metrics per partition by
+ * {@link PartitionReader} and combine metrics from partitions to the final result. How Spark
+ * combines metrics depends on the metric type. For streaming query, Spark will collect and combine
+ * metrics for a final result per micro batch.
+ *
+ * The metrics will be gathered during query execution back to the driver and then combined. The
+ * final result will be shown up in the physical operator in Spark UI.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface CustomMetric {
+    /**
+     * Returns the name of custom metric.
+     */
+    String name();
+
+    /**
+     * Returns the description of custom metric.
+     */
+    String description();
+
+    /**
+     * Supported metric type. The metric types must be supported by Spark SQL internal metrics.
+     * SUM: Spark sums up metrics from partitions as the final result.
+     */
+    enum MetricType {
+      SUM

Review comment:
       As you can see, the metric API here is a logical representation of metrics from DS v2. We are not going to re-invent a whole metric API. SQLMetrics are internal to Spark. It is not exposed to end users and data source developers, so I don't think it worries me too much.
   
   I'm not saying that we should not build a good API for DS v2 developers. Seems to me some points in above comments are from end user perspective, I'd like to point out this is for different scenarios.
   
   As this is used for DS v2 purpose, it is for SQL metrics and internally it is converted to SQL metrics. To make SQLMetric support Accmulator and let DS v2 reports Accmulator does not sound bad idea to me. But I'd doubt if it is worth.
   
   One argued point is to define arbitrary combine behavior. Once making SQLMetric support Accmulator, does it mean that we can use arbitrary Accmulator? No, basically SQLMetric allows certain types of metrics, we still need to change SQLMetric to support new metrics.
   
   So the only benefit I thought is to not have another metric API. And I don't think it is serious for this case at the beginning.
    
   This API is pretty simple as it is just logical representation and we only need small change internally to convert collected metrics from DS v2 to SQL metrics.
   
   I just read through the code path to be touched in order to make SQLMetric support Accmulator. Seems it involves more changes not only in DS v2 but maybe also in sql/core, etc.
   
   Although I doubt if it is worth, I'm open to the suggested Accmulator approach. Let's gather more thoughts from others?
   
   cc @cloud-fan @dongjoon-hyun @rdblue @Ngone51 @sunchao WDYT?
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #31476:
URL: https://github.com/apache/spark/pull/31476#discussion_r570505479



##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/CustomMetric.java
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.read;

Review comment:
       can this be used by both read and write path? 

##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/LongMetric.java
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.read;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom metric that reports a long value.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface LongMetric extends CustomMetric {
+  /**
+   * Returns the name of custom metric.
+   */
+  String name();

Review comment:
       why it needs to redefine name and description?

##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Scan.java
##########
@@ -102,4 +102,13 @@ default MicroBatchStream toMicroBatchStream(String checkpointLocation) {
   default ContinuousStream toContinuousStream(String checkpointLocation) {
     throw new UnsupportedOperationException(description() + ": Continuous scan are not supported");
   }
+
+  /**
+   * Returns an array of supported custom metrics with name and description.
+   * By default it returns empty array.
+   */
+  default CustomMetric[] supportedCustomMetrics() {

Review comment:
       maybe we should have a `SupportsReportMetrics` trait with this default implementation?

##########
File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/LongMetric.java
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.read;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom metric that reports a long value.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface LongMetric extends CustomMetric {
+  /**
+   * Returns the name of custom metric.
+   */
+  String name();
+
+  /**
+   * Returns the description of custom metric.
+   */
+  String description();
+
+  /**
+   * Returns the value of custom metric.
+   */
+  Long value();

Review comment:
       nit: would it be better to use `long` to avoid boxing/unboxing?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804523364


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40953/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan edited a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
cloud-fan edited a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-785170297


   I think it's too much work to support arbitrary accumulators in `SQLMetrics`. During these years, it seems sufficient to support various metrics (size, time, average, etc.) with simple long values. So I agree with the design here that the v2 metrics API should leverage `SQLMetrics` under the hood.
   
   However, I share the same concern from @zsxwing that using enum to indicate the metrics merging logic is too limited. I don't think it's hard to design an API to merge a bunch of long values.
   
   First, let's look at the task side. The read/write task needs to update the value of executor-side `SQLMetrics`, and Spark will send back the metrics update to the driver via heartbeat event or task complete event. The current PR adds a new method to `PartitionReader` to report the current metrics values, which looks good. Spark needs to call the new method at the end of task (at the end of each epoch for continuous mode), get the metrics value, find the corresponding `SQLMetrics` and update its value. We can polish the API from the current PR a little bit:
   ```
   interface CustomTaskMetrics {
     def name: String
     def value: Long
   }
   
   interface PartitionReader {
     ...
     def currentMetricsValues: Array[CustomTaskMetrics] = Nil
   }
   ```
   
   Then, let's look at the driver side. The driver side needs to aggregate the task metrics and produce a string that will be displayed in the UI. The current PR adds a new API `CustomMetrics` which is returned by `Scan`. This looks good and we just need to replace the enum `MetricsType` with a real aggregate method
   ```
   interface CustomMetrics {
     def name: String
     def description: String
     def initialValue: Long = 0
     def aggregateTaskMetrics(taskMetrics: Array[Long]): String
   }
   ```
   
   Internally, we can add a new `V2_CUSTOM` metrics type in `SQLMetrics`. For this metrics type, we delegate the aggregating logic to the corresponding `CustomMetrics`.
   
   What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan edited a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
cloud-fan edited a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-785170297


   I think it's too much work to support arbitrary accumulators in `SQLMetrics`. During these years, it seems sufficient to support various metrics (size, time, average, etc.) with simple long values. So I agree with the design here that the v2 metrics API should leverage `SQLMetrics` under the hood.
   
   However, I share the same concern from @zsxwing that using enum to indicate the metrics merging logic is too limited. I don't think it's hard to design an API to merge a bunch of long values.
   
   First, let's look at the task side. The read/write task needs to update the value of executor-side `SQLMetrics`, and Spark will send back the metrics update to the driver via heartbeat event or task complete event. The current PR adds a new method to `PartitionReader` to report the current metrics values, which looks good. Spark needs to call the new method at the end of task (at the end of each epoch for continuous mode), get the metrics value, find the corresponding `SQLMetrics` and update its value. We can polish the API from the current PR a little bit:
   ```
   interface CustomTaskMetrics {
     def name: String
     def value: Long
   }
   
   interface PartitionReader {
     ...
     def currentMetrics: Array[CustomTaskMetrics] = Nil
   }
   ```
   
   Then, let's look at the driver side. The driver side needs to aggregate the task metrics and produce a string that will be displayed in the UI. The current PR adds a new API `CustomMetrics` which is returned by `Scan`. This looks good and we just need to replace the metrics type with a real merge method
   ```
   interface CustomMetrics {
     def name: String
     ...
     def initialValue: Long = 0
     def aggregateTaskMetrics(taskMetrics: Array[Long]): String
   }
   ```
   
   Internally, we can add a new `V2_CUSTOM` metrics type in `SQLMetrics`. For this metrics type, we delegate the aggregating logic to the corresponding `CustomMetrics`.
   
   What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804890588


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40979/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804898898


   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org