You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by rapelly kartheek <ka...@gmail.com> on 2014/12/29 16:24:51 UTC

Spark profiler

Hi,

I want to find the time taken for replicating an rdd in spark cluster along
with the computation time on the replicated rdd.

Can someone please suggest a suitable spark profiler?

Thank you

Re: Spark profiler

Posted by Boromir Widas <vc...@gmail.com>.
It would be very helpful if there is any such tool, but the distributed
nature may be difficult to capture.

I had been trying to run a task where merging the accumulators was taking
an inordinately long time and was not reflecting in the standalone
cluster's web UI.
What I think will be useful is to publish metrics for different lifecycle
stages of a job to a system like Ganglia to zero in on bottlenecks. Perhaps
the user can define some of the metrics in the config.

I have been thinking of tinkering with the metrics publisher to add custom
metrics to get a bigger picture and time breakdown.

On Mon, Dec 29, 2014 at 10:24 AM, rapelly kartheek <ka...@gmail.com>
wrote:

> Hi,
>
> I want to find the time taken for replicating an rdd in spark cluster
> along with the computation time on the replicated rdd.
>
> Can someone please suggest a suitable spark profiler?
>
> Thank you
>