You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "liushuo (Jira)" <ji...@apache.org> on 2020/01/20 03:33:00 UTC

[jira] [Updated] (SPARK-30576) Whether to block streaming batch commit, merge all blocking batchs as one batch commit

     [ https://issues.apache.org/jira/browse/SPARK-30576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

liushuo updated SPARK-30576:
----------------------------
    Description: 
When the current job is not completed block streaming batch commit, until completed。The next job will merge all batch which  during the blocking.

for example:

 the input seq [1, 2, 3, 4, 5, 6]。 

batch duration: 1s。

The 3th batch will take a long time。Normally the other batches will be completed quickly.

We expect:

      1. the 4th batch will not be commited during the 3th batch computing, and 4th batch will be merge in the next batch. So that

the size of jobSets  is always less than 1。

      2. the num completedBatches less than the size of seq。

      3. the data is not lost

 

 

  was:When the current job is not completed block streaming batch commit, until completed。The next job will merge all batch which  during the blocking.


> Whether to block streaming batch commit, merge all blocking batchs as one batch commit
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-30576
>                 URL: https://issues.apache.org/jira/browse/SPARK-30576
>             Project: Spark
>          Issue Type: New Feature
>          Components: DStreams, Structured Streaming
>    Affects Versions: 2.4.4
>            Reporter: liushuo
>            Priority: Major
>
> When the current job is not completed block streaming batch commit, until completed。The next job will merge all batch which  during the blocking.
> for example:
>  the input seq [1, 2, 3, 4, 5, 6]。 
> batch duration: 1s。
> The 3th batch will take a long time。Normally the other batches will be completed quickly.
> We expect:
>       1. the 4th batch will not be commited during the 3th batch computing, and 4th batch will be merge in the next batch. So that
> the size of jobSets  is always less than 1。
>       2. the num completedBatches less than the size of seq。
>       3. the data is not lost
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org