You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by madhu phatak <ph...@gmail.com> on 2016/05/06 08:07:21 UTC

Spark structured streaming is Micro batch?

Hi,
As I was playing with new structured streaming API, I noticed that spark
starts processing as and when the data appears. It's no more seems like
micro batch processing. Is spark structured streaming will be an event
based processing?

-- 
Regards,
Madhukara Phatak
http://datamantra.io/

Re: Spark structured streaming is Micro batch?

Posted by madhu phatak <ph...@gmail.com>.
Hi,
Thank you for all those answers.

The below is code I am trying out

val records = sparkSession.read.format("csv").stream("/tmp/input")

val re = records.write.format("parquet").trigger(ProcessingTime(100.seconds)).
  option("checkpointLocation", "/tmp/checkpoint")
  .startStream("/tmp/output")


re.awaitTermination()


In above code, I assume batch size is 100 seconds? But it doesn't
seems to be that way.


On Fri, May 6, 2016 at 3:14 PM, Sachin Aggarwal <di...@gmail.com>
wrote:

> Hi Madhukara,
>
> What I understood from the code is that when ever runBatch return they
> trigger constructBatch so whatever is processing time for a batch will be
> ur batch time if u dnt specify a trigger.
>
> one flaw which i think in this is if your processing time keeps increasing
> with amount of data , then this batch interval keeps on increasing, they
> must put some boundary or some logic to block to prevent such case.
>
> here is one jira which i found related to this:-
> https://github.com/apache/spark/pull/12725
>
>
> On Fri, May 6, 2016 at 2:50 PM, Deepak Sharma <de...@gmail.com>
> wrote:
>
>> With Structured Streaming ,Spark would provide apis over spark sql engine.
>> Its like once you have the structured stream and dataframe created out of
>> this , you can do ad-hoc querying on the DF , which means you are actually
>> querying the stram without having to store or transform.
>> I have not used it yet but seems it will be like start streaming data
>> from source  as son as you define it.
>>
>> Thanks
>> Deepak
>>
>>
>> On Fri, May 6, 2016 at 1:37 PM, madhu phatak <ph...@gmail.com>
>> wrote:
>>
>>> Hi,
>>> As I was playing with new structured streaming API, I noticed that spark
>>> starts processing as and when the data appears. It's no more seems like
>>> micro batch processing. Is spark structured streaming will be an event
>>> based processing?
>>>
>>> --
>>> Regards,
>>> Madhukara Phatak
>>> http://datamantra.io/
>>>
>>
>>
>>
>> --
>> Thanks
>> Deepak
>> www.bigdatabig.com
>> www.keosha.net
>>
>
>
>
> --
>
> Thanks & Regards
>
> Sachin Aggarwal
> 7760502772
>



-- 
Regards,
Madhukara Phatak
http://datamantra.io/

Re: Spark structured streaming is Micro batch?

Posted by Sachin Aggarwal <di...@gmail.com>.
Hi Madhukara,

What I understood from the code is that when ever runBatch return they
trigger constructBatch so whatever is processing time for a batch will be
ur batch time if u dnt specify a trigger.

one flaw which i think in this is if your processing time keeps increasing
with amount of data , then this batch interval keeps on increasing, they
must put some boundary or some logic to block to prevent such case.

here is one jira which i found related to this:-
https://github.com/apache/spark/pull/12725


On Fri, May 6, 2016 at 2:50 PM, Deepak Sharma <de...@gmail.com> wrote:

> With Structured Streaming ,Spark would provide apis over spark sql engine.
> Its like once you have the structured stream and dataframe created out of
> this , you can do ad-hoc querying on the DF , which means you are actually
> querying the stram without having to store or transform.
> I have not used it yet but seems it will be like start streaming data from
> source  as son as you define it.
>
> Thanks
> Deepak
>
>
> On Fri, May 6, 2016 at 1:37 PM, madhu phatak <ph...@gmail.com> wrote:
>
>> Hi,
>> As I was playing with new structured streaming API, I noticed that spark
>> starts processing as and when the data appears. It's no more seems like
>> micro batch processing. Is spark structured streaming will be an event
>> based processing?
>>
>> --
>> Regards,
>> Madhukara Phatak
>> http://datamantra.io/
>>
>
>
>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>



-- 

Thanks & Regards

Sachin Aggarwal
7760502772

Re: Spark structured streaming is Micro batch?

Posted by Deepak Sharma <de...@gmail.com>.
With Structured Streaming ,Spark would provide apis over spark sql engine.
Its like once you have the structured stream and dataframe created out of
this , you can do ad-hoc querying on the DF , which means you are actually
querying the stram without having to store or transform.
I have not used it yet but seems it will be like start streaming data from
source  as son as you define it.

Thanks
Deepak


On Fri, May 6, 2016 at 1:37 PM, madhu phatak <ph...@gmail.com> wrote:

> Hi,
> As I was playing with new structured streaming API, I noticed that spark
> starts processing as and when the data appears. It's no more seems like
> micro batch processing. Is spark structured streaming will be an event
> based processing?
>
> --
> Regards,
> Madhukara Phatak
> http://datamantra.io/
>



-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net