You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Yuta Morisawa <yu...@kddi-research.jp> on 2018/05/16 00:38:48 UTC

Continuous Processing mode behaves differently from Batch mode

Hi all

Now I am using Structured Streaming in Continuous Processing mode and I 
faced a odd problem.

My code is so simple that it is similar to the sample code on the 
documentation.
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#continuous-processing


When I send the same text data ten times, for example 10 lines text, in 
Batch mode the result has 100 lines.

But in Continuous Processing mode the result has only 10 lines.
It appears duplicated lines are removed.

The difference of these two codes is only with or without trigger method.

Why these two code behave differently ?


--
Regard,
Yuta


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Continuous Processing mode behaves differently from Batch mode

Posted by Yuta Morisawa <yu...@kddi-research.jp>.
Thank you for reply.

I checked WEB UI and found that the total number of tasks is 10.
So, I changed the number of cores from 1 to 10, then it works well.

But I haven't figure out what is happening.

My assumption is that each Job consists of 10 tasks in default and each 
task occupies 1 core.
So, in my case, assigning only 1 core cause the issue.
In other words, Continuous mode needs at least 10 cores.

Is it right?


Regards;
Yuta

On 2018/05/16 15:24, Shixiong(Ryan) Zhu wrote:
> One possible case is you don't have enough resources to launch all tasks 
> for your continuous processing query. Could you check the Spark UI and 
> see if all tasks are running rather than waiting for resources?
> 
> Best Regards,
> 
> Shixiong Zhu
> Databricks Inc.
> shixiong@databricks.com <ma...@databricks.com>
> 
> databricks.com <http://databricks.com/>
> 
> http://databricks.com <http://databricks.com/>
> 
> 
> <https://databricks.com/sparkaisummit/north-america>
> 
> 
> On Tue, May 15, 2018 at 5:38 PM, Yuta Morisawa 
> <yu-morisawa@kddi-research.jp <ma...@kddi-research.jp>> wrote:
> 
>     Hi all
> 
>     Now I am using Structured Streaming in Continuous Processing mode
>     and I faced a odd problem.
> 
>     My code is so simple that it is similar to the sample code on the
>     documentation.
>     https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#continuous-processing
>     <https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#continuous-processing>
> 
> 
>     When I send the same text data ten times, for example 10 lines text,
>     in Batch mode the result has 100 lines.
> 
>     But in Continuous Processing mode the result has only 10 lines.
>     It appears duplicated lines are removed.
> 
>     The difference of these two codes is only with or without trigger
>     method.
> 
>     Why these two code behave differently ?
> 
> 
>     --
>     Regard,
>     Yuta
> 
> 
>     ---------------------------------------------------------------------
>     To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>     <ma...@spark.apache.org>
> 
> 


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Continuous Processing mode behaves differently from Batch mode

Posted by "Shixiong(Ryan) Zhu" <sh...@databricks.com>.
One possible case is you don't have enough resources to launch all tasks
for your continuous processing query. Could you check the Spark UI and see
if all tasks are running rather than waiting for resources?

Best Regards,
Shixiong Zhu
Databricks Inc.
shixiong@databricks.com

databricks.com

[image: http://databricks.com] <http://databricks.com/>


<https://databricks.com/sparkaisummit/north-america>

On Tue, May 15, 2018 at 5:38 PM, Yuta Morisawa <yu-morisawa@kddi-research.jp
> wrote:

> Hi all
>
> Now I am using Structured Streaming in Continuous Processing mode and I
> faced a odd problem.
>
> My code is so simple that it is similar to the sample code on the
> documentation.
> https://spark.apache.org/docs/latest/structured-streaming-pr
> ogramming-guide.html#continuous-processing
>
>
> When I send the same text data ten times, for example 10 lines text, in
> Batch mode the result has 100 lines.
>
> But in Continuous Processing mode the result has only 10 lines.
> It appears duplicated lines are removed.
>
> The difference of these two codes is only with or without trigger method.
>
> Why these two code behave differently ?
>
>
> --
> Regard,
> Yuta
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>