You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Cameron Lee (Jira)" <ji...@apache.org> on 2019/12/14 02:32:00 UTC

[jira] [Commented] (SAMZA-2303) Exclude side inputs when handling end-of-stream and watermarks for high-level

    [ https://issues.apache.org/jira/browse/SAMZA-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16996087#comment-16996087 ] 

Cameron Lee commented on SAMZA-2303:
------------------------------------

I think this can be fixed by adding a new field to TaskContextImpl which is "non side-input SSPs", and then make that field available to OperatorGraphImpl through InternalTaskContext, so that it can build the EndOfStreamStates using those SSPs instead of the SSPs from the task model.

Unfortunately, it will add another method which casts TaskContext to TaskContextImpl inside InternalTaskContext.

> Exclude side inputs when handling end-of-stream and watermarks for high-level
> -----------------------------------------------------------------------------
>
>                 Key: SAMZA-2303
>                 URL: https://issues.apache.org/jira/browse/SAMZA-2303
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Cameron Lee
>            Priority: Major
>
> OperatorImplGraph builds EndOfStreamStates and WatermarkStates objects with all of the input SSPs from the job model. That includes side-input SSPs. However, high-level operator tasks aren't given messages from side-input SSPs, so high-level operators should not need to include handling for end-of-stream and watermarks.
> The result of this issue is that end-of-stream and watermark handling tries to include side-inputs but never updates those states, which can result in not exiting properly (end-of-stream) and not correctly calculating watermarks.
> We currently have tests which use partitionBy and side-inputs, but they only use a single partition, so RunLoop is able to shutdown the task (RunLoop doesn't check side inputs when determining if the task is at the end of all streams). Normally, OperatorImpl will shut down the task when using high-level, and I think changing OperatorImpl to do ignore side input SSPs so that it does shut down the task is the fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)