You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Cameron Lee (Jira)" <ji...@apache.org> on 2021/06/30 23:12:00 UTC

[jira] [Assigned] (SAMZA-2303) Exclude side inputs when handling end-of-stream and watermarks for high-level

     [ https://issues.apache.org/jira/browse/SAMZA-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cameron Lee reassigned SAMZA-2303:
----------------------------------

    Assignee: Cameron Lee

> Exclude side inputs when handling end-of-stream and watermarks for high-level
> -----------------------------------------------------------------------------
>
>                 Key: SAMZA-2303
>                 URL: https://issues.apache.org/jira/browse/SAMZA-2303
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Cameron Lee
>            Assignee: Cameron Lee
>            Priority: Major
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> OperatorImplGraph builds EndOfStreamStates and WatermarkStates objects with all of the input SSPs from the job model. That includes side-input SSPs. However, high-level operator tasks aren't given messages from side-input SSPs, so high-level operators should not need to include handling for end-of-stream and watermarks.
> The result of this issue is that end-of-stream and watermark handling tries to include side-inputs but never updates those states, which can result in not exiting properly (end-of-stream) and not correctly calculating watermarks.
> We currently have tests which use partitionBy and side-inputs, but they only use a single partition, so RunLoop is able to shutdown the task (RunLoop doesn't check side inputs when determining if the task is at the end of all streams). Normally, OperatorImpl will shut down the task when using high-level, and I think changing OperatorImpl to do ignore side input SSPs so that it does shut down the task is the fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)