You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Filip Niksic (Jira)" <ji...@apache.org> on 2019/11/05 19:14:00 UTC

[jira] [Created] (FLINK-14616) Clarify the ordering guarantees in the "The Broadcast State Pattern"

Filip Niksic created FLINK-14616:
------------------------------------

             Summary: Clarify the ordering guarantees in the "The Broadcast State Pattern"
                 Key: FLINK-14616
                 URL: https://issues.apache.org/jira/browse/FLINK-14616
             Project: Flink
          Issue Type: Improvement
          Components: Documentation
    Affects Versions: 1.9.1
            Reporter: Filip Niksic


When talking about the order of events in [The Broadcast State Pattern|https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/state/broadcast_state.html#important-considerations], the current documentation states that the downstream tasks must not assume the broadcast events to be ordered. However, this seems to be imprecise. According to the response I got from [~fhueske] to a [question|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Ordered-events-in-broadcast-state-tp30879.html] I sent to the Flink user mailing list:
{quote}The order of broadcasted inputs is not guaranteed when the operator that broadcasts its output has a parallelism > 1 because the tasks that receive the broadcasted input consume the records in "random" order from their input channels.
{quote}
In particular, when the parallelism of the broadcasting operator is 1, the order _is_ guaranteed.

[~fhueske] continues with his suggestions on how to ensure the correct ordering of the broadcast events:
{quote}So there are two approaches:
1) make the operator that broadcasts its output run as an operator with parallelism 1 (or add a MapOperator with parallelism 1 that just forwards its input). This will cause all broadcasted records to go through the same network channel and their order is guaranteed on each receiver.
2) use timestamps of broadcasted records for ordering and watermarks to reason about completeness.

If the broadcasted data is (comparatively) small in volume (which is usually given because otherwise broadcasting would be expensive), I'd go with the first option.
The second approach is more difficult to implement.
{quote}
It would be great if the ordering guarantees could be clarified to avoid confusion. This could be achieved by simply expanding the paragraph that talks about the order of events in the "important considerations" section. More ambitiously, the suggestions given by [~fhueske] could be turned into examples.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)