You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by nelson <ne...@ysance.com> on 2014/12/19 12:59:21 UTC

Batch timestamp in spark streaming

Hi all,

I know the topic have been discussed before, but i couldn't find an answer
that might suits me.

How do you retrieve the current batch timestamp in spark streaming? Maybe
via BatchInfo but it does not seem to be linked to streaming context or
else... I currently have 1 minutes micro-batch and i'd like to link every
logs i process with the start of the batch a log belongs to. 

I also thought about broadcasting a new timestamp at the end of every batch
thanks to a StreamingListener but i couldn't manage to overwrite the first
broadcasted value.

Do you guys have any ideas?
Thanks a lot,

Nelson.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Batch-timestamp-in-spark-streaming-tp20786.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Batch timestamp in spark streaming

Posted by Sean Owen <so...@cloudera.com>.
Most of the methods of DStream will let you supply a function that
receives a timestamp as an argument of type Time. For example, we have

def foreachRDD(foreachFunc: RDD[T] => Unit)

but also

def foreachRDD(foreachFunc: (RDD[T], Time) => Unit)

If you supply the latter, you will get the timestamp of the batch as
an argument from Spark.

On Fri, Dec 19, 2014 at 11:59 AM, nelson <ne...@ysance.com> wrote:
> Hi all,
>
> I know the topic have been discussed before, but i couldn't find an answer
> that might suits me.
>
> How do you retrieve the current batch timestamp in spark streaming? Maybe
> via BatchInfo but it does not seem to be linked to streaming context or
> else... I currently have 1 minutes micro-batch and i'd like to link every
> logs i process with the start of the batch a log belongs to.
>
> I also thought about broadcasting a new timestamp at the end of every batch
> thanks to a StreamingListener but i couldn't manage to overwrite the first
> broadcasted value.
>
> Do you guys have any ideas?
> Thanks a lot,
>
> Nelson.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Batch-timestamp-in-spark-streaming-tp20786.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Batch timestamp in spark streaming

Posted by Gerard Maas <ge...@gmail.com>.
You could use dstream.foreachRDD( (rdd, timestamp) => ???)  to get access
to the time of each batch.

On Fri, Dec 19, 2014 at 12:59 PM, nelson <ne...@ysance.com> wrote:
>
> Hi all,
>
> I know the topic have been discussed before, but i couldn't find an answer
> that might suits me.
>
> How do you retrieve the current batch timestamp in spark streaming? Maybe
> via BatchInfo but it does not seem to be linked to streaming context or
> else... I currently have 1 minutes micro-batch and i'd like to link every
> logs i process with the start of the batch a log belongs to.
>
> I also thought about broadcasting a new timestamp at the end of every batch
> thanks to a StreamingListener but i couldn't manage to overwrite the first
> broadcasted value.
>
> Do you guys have any ideas?
> Thanks a lot,
>
> Nelson.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Batch-timestamp-in-spark-streaming-tp20786.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>