You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Gerard Maas <ge...@gmail.com> on 2014/11/20 21:25:58 UTC

Spark Streaming Metrics

As the Spark Streaming tuning guide indicates, the key indicators of a
healthy streaming job are:
- Processing Time
- Total Delay

The Spark UI page for the Streaming job [1] shows these two indicators but
the metrics source for Spark Streaming (StreamingSource.scala)  [2] does
not.

Any reasons for that? I would like to monitor job performance through an
external monitor (Ganglia in our case) and I've connected already the
currently published metrics.

-kr,  Gerard.


[1]
https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingPage.scala#L127

[2]
https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/StreamingSource.scala

Re: Spark Streaming Metrics

Posted by andy petrella <an...@gmail.com>.
Yo,

I've discussed with some guyz from cloudera that are working (only oO) on
spark-core and streaming.
The streaming was telling me the same thing about the scheduling part.

Do you have some nice screenshots and info about stages running, task time,
akka health and things like these -- I said the guy that I might poke him
today with more materials.

Btw, how're you?

Tchuss man
andy

PS: did you tried the recent events thingy?


On Fri Nov 21 2014 at 11:17:17 AM Gerard Maas <ge...@gmail.com> wrote:

> Looks like metrics are not a hot topic to discuss - yet so important to
> sleep well when jobs are running in production.
>
> I've created Spark-4537 <https://issues.apache.org/jira/browse/SPARK-4537>
> to track this issue.
>
> -kr, Gerard.
>
> On Thu, Nov 20, 2014 at 9:25 PM, Gerard Maas <ge...@gmail.com>
> wrote:
>
>> As the Spark Streaming tuning guide indicates, the key indicators of a
>> healthy streaming job are:
>> - Processing Time
>> - Total Delay
>>
>> The Spark UI page for the Streaming job [1] shows these two indicators
>> but the metrics source for Spark Streaming (StreamingSource.scala)  [2]
>> does not.
>>
>> Any reasons for that? I would like to monitor job performance through an
>> external monitor (Ganglia in our case) and I've connected already the
>> currently published metrics.
>>
>> -kr,  Gerard.
>>
>>
>> [1]
>> https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingPage.scala#L127
>>
>> [2]
>> https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/StreamingSource.scala
>>
>
>

Re: Spark Streaming Metrics

Posted by andy petrella <an...@gmail.com>.
Yo,

I've discussed with some guyz from cloudera that are working (only oO) on
spark-core and streaming.
The streaming was telling me the same thing about the scheduling part.

Do you have some nice screenshots and info about stages running, task time,
akka health and things like these -- I said the guy that I might poke him
today with more materials.

Btw, how're you?

Tchuss man
andy

PS: did you tried the recent events thingy?


On Fri Nov 21 2014 at 11:17:17 AM Gerard Maas <ge...@gmail.com> wrote:

> Looks like metrics are not a hot topic to discuss - yet so important to
> sleep well when jobs are running in production.
>
> I've created Spark-4537 <https://issues.apache.org/jira/browse/SPARK-4537>
> to track this issue.
>
> -kr, Gerard.
>
> On Thu, Nov 20, 2014 at 9:25 PM, Gerard Maas <ge...@gmail.com>
> wrote:
>
>> As the Spark Streaming tuning guide indicates, the key indicators of a
>> healthy streaming job are:
>> - Processing Time
>> - Total Delay
>>
>> The Spark UI page for the Streaming job [1] shows these two indicators
>> but the metrics source for Spark Streaming (StreamingSource.scala)  [2]
>> does not.
>>
>> Any reasons for that? I would like to monitor job performance through an
>> external monitor (Ganglia in our case) and I've connected already the
>> currently published metrics.
>>
>> -kr,  Gerard.
>>
>>
>> [1]
>> https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingPage.scala#L127
>>
>> [2]
>> https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/StreamingSource.scala
>>
>
>

Re: Spark Streaming Metrics

Posted by Gerard Maas <ge...@gmail.com>.
Looks like metrics are not a hot topic to discuss - yet so important to
sleep well when jobs are running in production.

I've created Spark-4537 <https://issues.apache.org/jira/browse/SPARK-4537>
to track this issue.

-kr, Gerard.

On Thu, Nov 20, 2014 at 9:25 PM, Gerard Maas <ge...@gmail.com> wrote:

> As the Spark Streaming tuning guide indicates, the key indicators of a
> healthy streaming job are:
> - Processing Time
> - Total Delay
>
> The Spark UI page for the Streaming job [1] shows these two indicators but
> the metrics source for Spark Streaming (StreamingSource.scala)  [2] does
> not.
>
> Any reasons for that? I would like to monitor job performance through an
> external monitor (Ganglia in our case) and I've connected already the
> currently published metrics.
>
> -kr,  Gerard.
>
>
> [1]
> https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingPage.scala#L127
>
> [2]
> https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/StreamingSource.scala
>

Re: Spark Streaming Metrics

Posted by Gerard Maas <ge...@gmail.com>.
Looks like metrics are not a hot topic to discuss - yet so important to
sleep well when jobs are running in production.

I've created Spark-4537 <https://issues.apache.org/jira/browse/SPARK-4537>
to track this issue.

-kr, Gerard.

On Thu, Nov 20, 2014 at 9:25 PM, Gerard Maas <ge...@gmail.com> wrote:

> As the Spark Streaming tuning guide indicates, the key indicators of a
> healthy streaming job are:
> - Processing Time
> - Total Delay
>
> The Spark UI page for the Streaming job [1] shows these two indicators but
> the metrics source for Spark Streaming (StreamingSource.scala)  [2] does
> not.
>
> Any reasons for that? I would like to monitor job performance through an
> external monitor (Ganglia in our case) and I've connected already the
> currently published metrics.
>
> -kr,  Gerard.
>
>
> [1]
> https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingPage.scala#L127
>
> [2]
> https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/StreamingSource.scala
>