You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Antony Mayi (JIRA)" <ji...@apache.org> on 2017/06/01 16:19:04 UTC

[jira] [Created] (BEAM-2398) Increasing latency within DirectRunner caused by cumulated TransformWatermarks

Antony Mayi created BEAM-2398:
---------------------------------

             Summary: Increasing latency within DirectRunner caused by cumulated TransformWatermarks
                 Key: BEAM-2398
                 URL: https://issues.apache.org/jira/browse/BEAM-2398
             Project: Beam
          Issue Type: Bug
          Components: runner-direct
    Affects Versions: 2.0.0
            Reporter: Antony Mayi
            Assignee: Thomas Groh
         Attachments: LatencyTest.java

Over the time the end-to-end latency of a pipeline running on DirectRunner is significantly increasing.

This is caused by ever growing sets of:
* {{WatermarkManager.TransformWatermarks.inputWatermark.pendingElements}}
* {{WatermarkManager.TransformWatermarks.synchronizedProcessingInputWatermark.pendingBundles}}

That means calls to {{WatermarkManager.TransformWatermarks.refresh()}} which need to iterate through that collections take longer and longer and the latency is growing.

I believe it is the line {{WaterMark.updatePending()}} line:

{quote}
    if (input != null) {
      // Add the unprocessed inputs
      completedTransform.addPending(result.getUnprocessedInputs());
{quote}
that's adding the items that are never removed.

See attached demo code showing the increasing latency.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)