You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Chris Riccomini (JIRA)" <ji...@apache.org> on 2015/04/01 00:24:53 UTC

[jira] [Commented] (SAMZA-552) Tuple or time window semantics in physical operator

    [ https://issues.apache.org/jira/browse/SAMZA-552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14389529#comment-14389529 ] 

Chris Riccomini commented on SAMZA-552:
---------------------------------------

The overall design/concept looks good to me. I'm mostly interested in the API, which is listed at the very end. Is the API at the end of the doc meant to be used by users, or by an query layer? I found that requiring both time and offsets in this API are a little confusing.

> Tuple or time window semantics in physical operator
> ---------------------------------------------------
>
>                 Key: SAMZA-552
>                 URL: https://issues.apache.org/jira/browse/SAMZA-552
>             Project: Samza
>          Issue Type: Sub-task
>          Components: sql
>    Affects Versions: 0.9.0
>            Reporter: Yi Pan (Data Infrastructure)
>            Assignee: Yi Pan (Data Infrastructure)
>         Attachments: DESIGN-SAMZA-552-3.md, DESIGN-SAMZA-552-3.pdf, DESIGN-SAMZA-552-6.md, DESIGN-SAMZA-552-6.pdf
>
>
> The discussion is based on how to support tuple and/or time based window operators in Samza physical operator layer.
> Here are the few observations:
> # Tuple represents the “physical ordering” of events while time-based window has semantic meanings to users
> # Total ordering between tuples are possible within Samza/Kafka given a deterministic MessageSelector on all input streams and offsets within each stream
> # No matter whether tuple or time is used to measure the window size, the window termination condition is needed to close a window to avoid the job to be wedged forever
> The following questions have to be answered to fully implement a window operator:
> # how to determine that a window is closed and no new tuples will be added?
> ## For tuple based, how do we close the window if messages do not come or get delayed?
> ## For time based, how do we close the window if
> ### the messages are not strictly in order w/ the time?
> ### the message w/ timestamp greater than the window boundary does not come or gets delayed?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)