You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/03/30 22:19:05 UTC

[GitHub] [spark] viirya edited a comment on pull request #31451: [SPARK-34338][SQL] Report metrics from Datasource v2 scan

viirya edited a comment on pull request #31451:
URL: https://github.com/apache/spark/pull/31451#issuecomment-810554421


   > Thanks for the explanation. This sounds like a change from the API discussed in #31476. IIUC, before, the expectation was that `PartitionReader#currentMetricsValues()` is called after the partition is read. Now, the expectation is that `PartitionReader#currentMetricsValues()` is called for every row we iterate through in the reader. Such expectation should be documented clearly in the API, for implementors of custom metrics.
   
   I don't see we have documented in the API the exact time where `currentMetricsValues` will be called. This is implementation detail. If you worry about the implementation of `currentMetricsValues` will do something taking time. We can add a note to the API suggesting not to do heavy logic in it. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org