You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Michael Ravits <mi...@gmail.com> on 2016/06/06 13:54:35 UTC

Logging from transformations in PySpark

Hi,

I'd like to send some performance metrics from some of the transformations
to StatsD.
I understood that I should create a new connection to StatsD from each
transformation which I'm afraid would harm performance.
I've also read that there is a workaround for this in Scala by defining an
object as transient.
My question is whether that's also possible in Python with PySpark?
Specifically I'd like to lazily initialize a transient object that will be
used for sending metrics to StatsD over a local socket connection.

Thanks,
Michael