You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Tim Moran <ti...@privitar.com> on 2016/10/17 12:37:19 UTC

OutputMetrics with data frames (spark-avro)

Hi,

I'm using the Databricks spark-avro library to save some DataFrames out as
Avro (with Spark 1.6.1). When I do this however, I lose the information in
the spark events about the number of records and size of data written to
HDFS for each partition that's available if I save an RDD out as a text
file.

Is this just a limitation of data frames, or is there a way of making this
information available? It's really useful for performance monitoring.

Thanks,

Tim.

-- 
This email is confidential, if you are not the intended recipient please 
delete it and notify us immediately by emailing the sender. You should not 
copy it or use it for any purpose nor disclose its contents to any other 
person. Privitar Limited is registered in England with registered number 
09305666. Registered office Salisbury House, Station Road, Cambridge, 
CB12LA.