You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/10/15 14:41:57 UTC

[GitHub] [spark] hvanhovell opened a new pull request #26127: [SPARK-29348][SQL] This PR adds observable metrics to DBR.

hvanhovell opened a new pull request #26127: [SPARK-29348][SQL] This PR adds observable metrics to DBR.
URL: https://github.com/apache/spark/pull/26127
 
 
   ### What changes were proposed in this pull request?
   Observable metrics are named arbitrary aggregate functions that can be defined on a query (Dataframe). As soon as the execution of a Dataframe reaches a completion point (e.g. finishes batch query or reaches streaming epoch) a named event is emitted that contains the metrics for the data processed since the last completion point.
   
   A user can observe these metrics by attaching a listener to spark session, it depends on the execution mode which listener to attach:
   - Batch: `QueryExecutionListener`. This will be called when the query completes. A user can access the metrics by using the `QueryExecution.observedMetrics` map.
   - Streaming: `StreamingQueryListener`. This will be called when the streaming query completes an epoch. A user can access the metrics by using the `StreamingQueryProgress.observedMetrics` map.
   
   ### Why are the changes needed?
   This enabled observable metrics.
   
   ### Does this PR introduce any user-facing change?
   Yes. It adds the `observe` method to `Dataset`.
   
   ### How was this patch tested?
   - Added unit tests for the `CollectMetrics` logical node to the `AnalysisSuite`.
   - Added unit tests for `StreamingProgress` JSON serialization to the `StreamingQueryStatusAndProgressSuite`.
   - Added integration tests for streaming to the `StreamingQueryListenerSuite`.
   - Added integration tests for batch to the `DataFrameCallbackSuite`.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org