You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Chen Song <ch...@gmail.com> on 2016/02/08 17:34:49 UTC
Access batch statistics in Spark Streaming
Apologize in advance if someone has already asked and addressed this
question.
In Spark Streaming, how can I programmatically get the batch statistics
like schedule delay, total delay and processing time (They are shown in the
job UI streaming tab)? I need such information to raise alerts in some
circumstances. For example, if the scheduling is delayed more than a
threshold.
Thanks,
Chen
Re: Access batch statistics in Spark Streaming
Posted by Bryan Jeffrey <br...@gmail.com>.
>From within a Spark job you can use a Periodic Listener:
ssc.addStreamingListener(PeriodicStatisticsListener(Seconds(60)))
class PeriodicStatisticsListener(timePeriod: Duration) extends
StreamingListener {
private val logger = LoggerFactory.getLogger("Application")
override def onBatchCompleted(batchCompleted :
org.apache.spark.streaming.scheduler.StreamingListenerBatchCompleted) :
scala.Unit = {
if(startTime == Time(0)) {
startTime = batchCompleted.batchInfo.batchTime
}
logger.info("Batch Complete @ " + new
DateTime(batchCompleted.batchInfo.batchTime.milliseconds).withZone(DateTimeZone.UTC)
+ " (" + batchCompleted.batchInfo.batchTime + ")" +
" with records " + batchCompleted.batchInfo.numRecords +
" in processing time " +
batchCompleted.batchInfo.processingDelay.getOrElse(0.toLong) / 1000 + "
seconds")
}
On Mon, Feb 8, 2016 at 11:34 AM, Chen Song <ch...@gmail.com> wrote:
> Apologize in advance if someone has already asked and addressed this
> question.
>
> In Spark Streaming, how can I programmatically get the batch statistics
> like schedule delay, total delay and processing time (They are shown in the
> job UI streaming tab)? I need such information to raise alerts in some
> circumstances. For example, if the scheduling is delayed more than a
> threshold.
>
> Thanks,
> Chen
>
>