You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Jagadish (JIRA)" <ji...@apache.org> on 2016/07/16 00:01:34 UTC

[jira] [Updated] (SAMZA-974) Build an end-of-stream concept into Samza

     [ https://issues.apache.org/jira/browse/SAMZA-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jagadish updated SAMZA-974:
---------------------------
    Description: 
Samza currently works with unbounded data sources. However, for bounded data sources like HDFS files, snapshot files which are not infinite, we need a notion of 'end-of-stream'.

The following are the logical tasks:
1.SystemConsumer will indicate to Samza that the end of stream has been reached for an SSP.
2. Samza will shut down the task if all SSPs in the task are at end of stream.
3. Samza will provide a callback to the task so that it can perform cleanups/ commits once tasks are at end of stream.
4. Samza will shut down the container if all tasks in the container have been shut down.
5. Samza will ultimately shut down the job if all containers in the job have been shut down.

This is a step towards realizing a 'finite' Samza job that terminates (as opposed to an infinite stream job that keeps running) once data processing is complete.

  was:
Samza currently works with unbounded data sources. However, for bounded data sources like HDFS files, snapshot files which are not infinite, we need a notion of 'end-of-stream'.

The following are the logical tasks:
1.SystemConsumer will indicate to Samza that the end of stream has been reached for an SSP.
2. Samza will shut down the task if all SSPs in the task are at end of stream.
3. Samza will shut down the container if all tasks in the container have been shut down.
4. Samza will ultimately shut down the job if all containers in the job have been shut down.

This is a step towards realizing a 'finite' Samza job that terminates (as opposed to an infinite stream job that keeps running) once data processing is complete.


> Build an end-of-stream concept into Samza
> -----------------------------------------
>
>                 Key: SAMZA-974
>                 URL: https://issues.apache.org/jira/browse/SAMZA-974
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Jagadish
>            Assignee: Jagadish
>
> Samza currently works with unbounded data sources. However, for bounded data sources like HDFS files, snapshot files which are not infinite, we need a notion of 'end-of-stream'.
> The following are the logical tasks:
> 1.SystemConsumer will indicate to Samza that the end of stream has been reached for an SSP.
> 2. Samza will shut down the task if all SSPs in the task are at end of stream.
> 3. Samza will provide a callback to the task so that it can perform cleanups/ commits once tasks are at end of stream.
> 4. Samza will shut down the container if all tasks in the container have been shut down.
> 5. Samza will ultimately shut down the job if all containers in the job have been shut down.
> This is a step towards realizing a 'finite' Samza job that terminates (as opposed to an infinite stream job that keeps running) once data processing is complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)