You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Bill Jay <bi...@gmail.com> on 2014/07/23 23:39:18 UTC

Get Spark Streaming timestamp

Hi all,

I have a question regarding Spark streaming. When we use the
saveAsTextFiles function and my batch is 60 seconds, Spark will generate a
series of files such as:

result-1406148960000, result-1406148020000, result-1406148080000, etc.

I think this is the timestamp for the beginning of each batch. How can we
extract the variable and use it in our code? Thanks!

Bill

Re: Get Spark Streaming timestamp

Posted by Bill Jay <bi...@gmail.com>.
Hi Tobias,

It seems this parameter is an input to the function. What I am expecting is
output from a function that tells me the starting or ending time of the
batch. For instance, If I use saveAsTextFiles, it seems DStream will
generate a batch every minute and the starting time is a complete minute
(batch size is 60 seconds). Thanks!

Bill


On Wed, Jul 23, 2014 at 6:56 PM, Tobias Pfeiffer <tg...@preferred.jp> wrote:

> Bill,
>
> Spark Streaming's DStream provides overloaded methods for transform() and
> foreachRDD() that allow you to access the timestamp of a batch:
>
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.dstream.DStream
>
> I think the timestamp is the end of the batch, not the beginning. For
> example, I compute runtime taking the difference between now() and the time
> I get as a parameter in foreachRDD().
>
> Tobias
>
>
>
> On Thu, Jul 24, 2014 at 6:39 AM, Bill Jay <bi...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I have a question regarding Spark streaming. When we use the
>> saveAsTextFiles function and my batch is 60 seconds, Spark will generate a
>> series of files such as:
>>
>> result-1406148960000, result-1406148020000, result-1406148080000, etc.
>>
>> I think this is the timestamp for the beginning of each batch. How can we
>> extract the variable and use it in our code? Thanks!
>>
>> Bill
>>
>
>

Re: Get Spark Streaming timestamp

Posted by Tobias Pfeiffer <tg...@preferred.jp>.
Bill,

Spark Streaming's DStream provides overloaded methods for transform() and
foreachRDD() that allow you to access the timestamp of a batch:
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.dstream.DStream

I think the timestamp is the end of the batch, not the beginning. For
example, I compute runtime taking the difference between now() and the time
I get as a parameter in foreachRDD().

Tobias



On Thu, Jul 24, 2014 at 6:39 AM, Bill Jay <bi...@gmail.com>
wrote:

> Hi all,
>
> I have a question regarding Spark streaming. When we use the
> saveAsTextFiles function and my batch is 60 seconds, Spark will generate a
> series of files such as:
>
> result-1406148960000, result-1406148020000, result-1406148080000, etc.
>
> I think this is the timestamp for the beginning of each batch. How can we
> extract the variable and use it in our code? Thanks!
>
> Bill
>