You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Jingsong Lee (JIRA)" <ji...@apache.org> on 2017/03/03 06:33:54 UTC

[jira] [Created] (BEAM-1612) Support real Bundle in Flink runner

Jingsong Lee created BEAM-1612:
----------------------------------

             Summary: Support real Bundle in Flink runner
                 Key: BEAM-1612
                 URL: https://issues.apache.org/jira/browse/BEAM-1612
             Project: Beam
          Issue Type: Improvement
          Components: runner-flink
            Reporter: Jingsong Lee
            Assignee: Jingsong Lee


The Bundle is very important in the beam model. Users can use the bundle to flush buffer, can reuse many heavyweight resources in a bundle. Most IO plugins use the bundle to flush. 

Moreover, FlinkRunner can also use Bundle to reduce access to the FlinkState, such as first placed in JavaHeap, flush into RocksDbState when invoke finishBundle , this can reduce the number of serialization.

But now FlinkRunner calls the finishBundle every processElement. We need support real Bundle.

I think we can have the following implementations:

1.Invoke finishBundle and next startBundle in {{snapshot}} of Flink. But sometimes this "Bundle" maybe too big. This depends on the user's checkpoint configuration.

2.Manually control the size of the bundle. The half-bundle will be flushed to a full-bundle by count or eventTime or processTime or {{snapshot}}. We do not need to wait, just call the startBundle and finishBundle at the right time.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)