You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "parthchandra (via GitHub)" <gi...@apache.org> on 2024/03/13 22:41:19 UTC

[PR] [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources [spark]

parthchandra opened a new pull request, #45505:
URL: https://github.com/apache/spark/pull/45505

   ### What changes were proposed in this pull request?
   Provides a new interface `CustomFileTaskMetric` that extends the `CustomTaskMetric` and allows updating of values.
   
   ### Why are the changes needed?
   The current interface to provide custom metrics does not work for adding file based metrics for the parquet reader where a single `FilePartitionReader` may need to collect metrics from multiple parquet file readers 
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   This is just adding the interface. The implementation and tests will be done in a follow up PR that addresses https://issues.apache.org/jira/browse/SPARK-47291
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources [spark]

Posted by "parthchandra (via GitHub)" <gi...@apache.org>.
parthchandra commented on code in PR #45505:
URL: https://github.com/apache/spark/pull/45505#discussion_r1529029527


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/metric/CustomFileTaskMetric.java:
##########
@@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.metric;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom file task metric. This allows file based data source V2 implementations
+ * to use a single PartitionReader with multiple file readers. Each file reader can
+ * provide its own metrics values and they can be added into the parent PartitionReader
+ *
+ * @since 4.0.0
+ */
+@Evolving
+public interface CustomFileTaskMetric extends CustomTaskMetric {

Review Comment:
   Actually, let me rework it without this interface



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on PR #45505:
URL: https://github.com/apache/spark/pull/45505#issuecomment-2007883753

   Thank you @parthchandra 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #45505:
URL: https://github.com/apache/spark/pull/45505#discussion_r1526917192


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/metric/CustomFileTaskMetric.java:
##########
@@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.metric;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom file task metric. This allows file based data source V2 implementations
+ * to use a single PartitionReader with multiple file readers. Each file reader can
+ * provide its own metrics values and they can be added into the parent PartitionReader
+ *
+ * @since 4.0.0
+ */
+@Evolving
+public interface CustomFileTaskMetric extends CustomTaskMetric {

Review Comment:
   I mean it seems feasible to use `CustomTaskMetric` to achieve same goal.



##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/metric/CustomFileTaskMetric.java:
##########
@@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.metric;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom file task metric. This allows file based data source V2 implementations
+ * to use a single PartitionReader with multiple file readers. Each file reader can
+ * provide its own metrics values and they can be added into the parent PartitionReader
+ *
+ * @since 4.0.0
+ */
+@Evolving
+public interface CustomFileTaskMetric extends CustomTaskMetric {

Review Comment:
   I mean it seems feasible to use `CustomTaskMetric` to achieve same goal without adding new method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources [spark]

Posted by "parthchandra (via GitHub)" <gi...@apache.org>.
parthchandra commented on code in PR #45505:
URL: https://github.com/apache/spark/pull/45505#discussion_r1526908137


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/metric/CustomFileTaskMetric.java:
##########
@@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.metric;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom file task metric. This allows file based data source V2 implementations
+ * to use a single PartitionReader with multiple file readers. Each file reader can
+ * provide its own metrics values and they can be added into the parent PartitionReader
+ *
+ * @since 4.0.0
+ */
+@Evolving
+public interface CustomFileTaskMetric extends CustomTaskMetric {

Review Comment:
   I wasn't sure if it is OK to change an existing public interface. If it is acceptable to simply add this to `CustomTaskMetric` that would be perfect.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45505:
URL: https://github.com/apache/spark/pull/45505#discussion_r1526923182


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/metric/CustomFileTaskMetric.java:
##########
@@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.metric;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom file task metric. This allows file based data source V2 implementations
+ * to use a single PartitionReader with multiple file readers. Each file reader can
+ * provide its own metrics values and they can be added into the parent PartitionReader
+ *
+ * @since 4.0.0
+ */
+@Evolving
+public interface CustomFileTaskMetric extends CustomTaskMetric {

Review Comment:
   +1 for @viirya 's advice.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #45505:
URL: https://github.com/apache/spark/pull/45505#discussion_r1526917192


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/metric/CustomFileTaskMetric.java:
##########
@@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.metric;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom file task metric. This allows file based data source V2 implementations
+ * to use a single PartitionReader with multiple file readers. Each file reader can
+ * provide its own metrics values and they can be added into the parent PartitionReader
+ *
+ * @since 4.0.0
+ */
+@Evolving
+public interface CustomFileTaskMetric extends CustomTaskMetric {

Review Comment:
   I mean it seems feasible to use `CustomTaskMetric` to achieve same goal without adding new method.
   
   The new API is `update` method. You can definitely update the current value of `CustomTaskMetric` without problem.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45505:
URL: https://github.com/apache/spark/pull/45505#discussion_r1525027613


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/metric/CustomFileTaskMetric.java:
##########
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.metric;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Optional;
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom file task metric. This allows file based data source V2 implementations
+ * to use a single PartitionReader with multiple file readers. Each file reader can
+ * provide its own metrics values and they can be added into the parent PartitionReader
+ * @since 4.0.0
+ */
+@Evolving
+public interface CustomFileTaskMetric extends CustomTaskMetric {
+
+  /**
+   * Updates the underlying value of the metric by adding addValue to the existing value
+   */
+  default void update(long addValue) {}
+
+
+  /*
+  Merge(add) the values of the corresponding CustomTaskMetric from src array into target array
+  adding a new element if it doesn't already exist. Returns a new array without modifying the
+  target array

Review Comment:
   Indentation?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45505:
URL: https://github.com/apache/spark/pull/45505#discussion_r1525027264


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/metric/CustomFileTaskMetric.java:
##########
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.metric;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Optional;
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom file task metric. This allows file based data source V2 implementations
+ * to use a single PartitionReader with multiple file readers. Each file reader can
+ * provide its own metrics values and they can be added into the parent PartitionReader
+ * @since 4.0.0

Review Comment:
   nit. Please add a new empty line before this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45505:
URL: https://github.com/apache/spark/pull/45505#discussion_r1525034616


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/metric/CustomFileTaskMetric.java:
##########
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.metric;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Optional;
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom file task metric. This allows file based data source V2 implementations
+ * to use a single PartitionReader with multiple file readers. Each file reader can
+ * provide its own metrics values and they can be added into the parent PartitionReader
+ * @since 4.0.0
+ */
+@Evolving
+public interface CustomFileTaskMetric extends CustomTaskMetric {
+
+  /**
+   * Updates the underlying value of the metric by adding addValue to the existing value
+   */
+  default void update(long addValue) {}
+
+
+  /*
+  Merge(add) the values of the corresponding CustomTaskMetric from src array into target array
+  adding a new element if it doesn't already exist. Returns a new array without modifying the
+  target array
+  */
+  static List<CustomFileTaskMetric> mergeMetricValues(List<CustomTaskMetric> src,

Review Comment:
   Shall we move this static method outside of this interface? I guess you couldn't find a proper existing utility class for this, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources [spark]

Posted by "parthchandra (via GitHub)" <gi...@apache.org>.
parthchandra closed pull request #45505: [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources
URL: https://github.com/apache/spark/pull/45505


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources [spark]

Posted by "parthchandra (via GitHub)" <gi...@apache.org>.
parthchandra commented on code in PR #45505:
URL: https://github.com/apache/spark/pull/45505#discussion_r1525407443


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/metric/CustomFileTaskMetric.java:
##########
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.metric;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Optional;
+import org.apache.spark.annotation.Evolving;

Review Comment:
   Done



##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/metric/CustomFileTaskMetric.java:
##########
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.metric;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Optional;
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom file task metric. This allows file based data source V2 implementations
+ * to use a single PartitionReader with multiple file readers. Each file reader can
+ * provide its own metrics values and they can be added into the parent PartitionReader
+ * @since 4.0.0

Review Comment:
   Done



##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/metric/CustomFileTaskMetric.java:
##########
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.metric;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Optional;
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom file task metric. This allows file based data source V2 implementations
+ * to use a single PartitionReader with multiple file readers. Each file reader can
+ * provide its own metrics values and they can be added into the parent PartitionReader
+ * @since 4.0.0
+ */
+@Evolving
+public interface CustomFileTaskMetric extends CustomTaskMetric {
+
+  /**
+   * Updates the underlying value of the metric by adding addValue to the existing value
+   */
+  default void update(long addValue) {}
+
+
+  /*
+  Merge(add) the values of the corresponding CustomTaskMetric from src array into target array
+  adding a new element if it doesn't already exist. Returns a new array without modifying the
+  target array

Review Comment:
   Reformatted



##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/metric/CustomFileTaskMetric.java:
##########
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.metric;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Optional;
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom file task metric. This allows file based data source V2 implementations
+ * to use a single PartitionReader with multiple file readers. Each file reader can
+ * provide its own metrics values and they can be added into the parent PartitionReader
+ * @since 4.0.0
+ */
+@Evolving
+public interface CustomFileTaskMetric extends CustomTaskMetric {
+
+  /**
+   * Updates the underlying value of the metric by adding addValue to the existing value
+   */
+  default void update(long addValue) {}
+
+
+  /*
+  Merge(add) the values of the corresponding CustomTaskMetric from src array into target array
+  adding a new element if it doesn't already exist. Returns a new array without modifying the
+  target array
+  */
+  static List<CustomFileTaskMetric> mergeMetricValues(List<CustomTaskMetric> src,

Review Comment:
   That's right. This is a utility method and there really isn't any other good place to put this. Anyway, I've moved it to `org.apache.spark.sql.execution.metric.CustomMetrics`. One difference is that the function was in a Java class earlier and is now in a Scala class. In general, that makes it slightly harder for a DSV2 implementation to use it because calling Scala from Java is still not perfect. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #45505:
URL: https://github.com/apache/spark/pull/45505#discussion_r1526899957


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/metric/CustomFileTaskMetric.java:
##########
@@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.metric;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom file task metric. This allows file based data source V2 implementations
+ * to use a single PartitionReader with multiple file readers. Each file reader can
+ * provide its own metrics values and they can be added into the parent PartitionReader
+ *
+ * @since 4.0.0
+ */
+@Evolving
+public interface CustomFileTaskMetric extends CustomTaskMetric {

Review Comment:
   Hmm, I am wondering why we need this interface? I think we can definitely do the same thing with `CustomTaskMetric`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources [spark]

Posted by "parthchandra (via GitHub)" <gi...@apache.org>.
parthchandra commented on code in PR #45505:
URL: https://github.com/apache/spark/pull/45505#discussion_r1528983875


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/metric/CustomFileTaskMetric.java:
##########
@@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.metric;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom file task metric. This allows file based data source V2 implementations
+ * to use a single PartitionReader with multiple file readers. Each file reader can
+ * provide its own metrics values and they can be added into the parent PartitionReader
+ *
+ * @since 4.0.0
+ */
+@Evolving
+public interface CustomFileTaskMetric extends CustomTaskMetric {

Review Comment:
   DSV1 uses `SQLMetric` which has a `set(v: Long)`  and an `add(v: Long)` so a data source can set an initial value for a metric and then keep updating it. DSV2 uses `CustomTaskMetric` which really should have an equivalent but doesn't. That's what this interface is trying to provide. 
   
   In this case, `FilePartitionReader` needs to update the value of a `CustomTaskMetric` every time a file reader completes. So `FilePartitionReader` can have a metric say `ReadTime` which it updates every time a file reader completes. But, only the Parquet file reader provides this metric, so now `FilePartitionReader` can query the file reader and gets the metrics that are supported. But it has no way to update them. `FilePartitionReader` can of course create its own set of metrics corresponding to the supported ones, keep its own set of values and update them as needed, but having an interface to update the value which the metric supports seems so much more elegant.
   
   Here's the usage of this interface - https://github.com/parthchandra/spark/blob/SPARK-47291/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FilePartitionReader.scala
   
   And the corresponding metrics - 
   https://github.com/parthchandra/spark/blob/SPARK-47291/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetMetricsV2.scala
   
   If you still think this is not reasonable, I will try to re-work it without this interface.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #45505:
URL: https://github.com/apache/spark/pull/45505#discussion_r1525025397


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/metric/CustomFileTaskMetric.java:
##########
@@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.metric;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Optional;
+import org.apache.spark.annotation.Evolving;

Review Comment:
   We need a new line before this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #45505:
URL: https://github.com/apache/spark/pull/45505#discussion_r1526917192


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/metric/CustomFileTaskMetric.java:
##########
@@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.metric;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom file task metric. This allows file based data source V2 implementations
+ * to use a single PartitionReader with multiple file readers. Each file reader can
+ * provide its own metrics values and they can be added into the parent PartitionReader
+ *
+ * @since 4.0.0
+ */
+@Evolving
+public interface CustomFileTaskMetric extends CustomTaskMetric {

Review Comment:
   I mean it seems feasible to use `CustomTaskMetric` to achieve same goal without adding new method.
   
   The new API is `update` method. You can definitely update the current value of `CustomTaskMetric` without problem. `CustomTaskMetric` is simply an interface used by the instances collecting metrics to report task level metrics. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #45505:
URL: https://github.com/apache/spark/pull/45505#discussion_r1526899957


##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/metric/CustomFileTaskMetric.java:
##########
@@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.metric;
+
+import org.apache.spark.annotation.Evolving;
+
+/**
+ * A custom file task metric. This allows file based data source V2 implementations
+ * to use a single PartitionReader with multiple file readers. Each file reader can
+ * provide its own metrics values and they can be added into the parent PartitionReader
+ *
+ * @since 4.0.0
+ */
+@Evolving
+public interface CustomFileTaskMetric extends CustomTaskMetric {

Review Comment:
   Hmm, I am wondering why we need this interface? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources [spark]

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #45505:
URL: https://github.com/apache/spark/pull/45505#discussion_r1526889640


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/CustomMetrics.scala:
##########
@@ -70,4 +70,23 @@ object CustomMetrics {
       }
     }
   }
+
+  /**
+   * Merge(add) the values of the corresponding CustomTaskMetric from src array into target array
+   * adding a new element if it doesn't already exist.
+   */

Review Comment:
   ```suggestion
     /**
      * Update the values of the corresponding CustomTaskMetric with same metric name from src array into target array.
      * If the source metric doesn't already exist, adding it to target metric array.
      */
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #45505:
URL: https://github.com/apache/spark/pull/45505#issuecomment-1998703520

   cc @viirya 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-47290][SQL] Extend CustomTaskMetric to allow metric values from multiple sources [spark]

Posted by "parthchandra (via GitHub)" <gi...@apache.org>.
parthchandra commented on PR #45505:
URL: https://github.com/apache/spark/pull/45505#issuecomment-2007857003

   Closing this PR. As @viirya pointed out, it is possible to achieve the update to CustomTaskMetric without a new interface


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org