You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Yi Pan (Data Infrastructure) (JIRA)" <ji...@apache.org> on 2015/02/10 01:17:39 UTC

[jira] [Created] (SAMZA-552) Tuple or time window semantics in physical operator

Yi Pan (Data Infrastructure) created SAMZA-552:
--------------------------------------------------

             Summary: Tuple or time window semantics in physical operator
                 Key: SAMZA-552
                 URL: https://issues.apache.org/jira/browse/SAMZA-552
             Project: Samza
          Issue Type: Sub-task
            Reporter: Yi Pan (Data Infrastructure)


The discussion is based on how to support tuple and/or time based window operators in Samza physical operator layer.

Here are the few observations:
# Tuple represents the “physical ordering” of events while time-based window has semantic meanings to users
# Total ordering between tuples are possible within Samza/Kafka given a deterministic MessageSelector on all input streams and offsets within each stream
# No matter whether tuple or time is used to measure the window size, the window termination condition is needed to close a window to avoid the job to be wedged forever

The following questions have to be answered to fully implement a window operator:
# how to determine that a window is closed and no new tuples will be added?
## For tuple based, how do we close the window if messages do not come or get delayed?
## For time based, how do we close the window if
### the messages are not strictly in order w/ the time?
### the message w/ timestamp greater than the window boundary does not come or gets delayed?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)