You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/03/30 18:38:24 UTC

[GitHub] [spark] wypoon edited a comment on pull request #31451: [WIP][SPARK-34338][SQL] Report metrics from Datasource v2 scan

wypoon edited a comment on pull request #31451:
URL: https://github.com/apache/spark/pull/31451#issuecomment-810480565


   > > Can you please explain why you changed from passing a completion function to `DataSourceRDD` to passing a `Map[String, SQLMetric]`? What is the benefit?
   > 
   > It is more fit to the current approach we use `SQLMetric`. This approach looks more clear to me. We update custom metrics during consuming the data now, instead of at the completion of data consuming.
   
   Thanks for the explanation. This sounds like a change from the API discussed in https://github.com/apache/spark/pull/31476. IIUC, before, the expectation was that `PartitionReader#currentMetricsValues()` is called after the partition is read. Now, the expectation is that `PartitionReader#currentMetricsValues()` is called in every iteration through the reader. Such expectation should be documented clearly in the API, for implementors of custom metrics.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org