You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/06/13 19:54:00 UTC
[jira] [Commented] (BEAM-2398) Increasing latency within
DirectRunner caused by cumulated TransformWatermarks
[ https://issues.apache.org/jira/browse/BEAM-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16048317#comment-16048317 ]
ASF GitHub Bot commented on BEAM-2398:
--------------------------------------
GitHub user tgroh opened a pull request:
https://github.com/apache/beam/pull/3353
[BEAM-2398] Do not produce Unprocessed Inputs if all inputs were Processed
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
- [ ] Make sure the PR title is formatted like:
`[BEAM-<Jira issue #>] Description of pull request`
- [ ] Make sure tests pass via `mvn clean verify`.
- [ ] Replace `<Jira issue #>` in the title with the actual Jira issue
number, if there is one.
- [ ] If this contribution is large, please file an Apache
[Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
---
This stops the WatermarkManager "Pending Bundles" from growing without
bound.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tgroh/beam empty_additional_inputs
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/beam/pull/3353.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3353
----
commit 5fa061310ce8f00cb13d461af240357269ba923b
Author: Thomas Groh <tg...@google.com>
Date: 2017-06-13T19:50:58Z
Do not produce Unprocessed Inputs if all inputs were Processed
This stops the WatermarkManager "Pending Bundles" from growing without
bound.
----
> Increasing latency within DirectRunner caused by cumulated TransformWatermarks
> ------------------------------------------------------------------------------
>
> Key: BEAM-2398
> URL: https://issues.apache.org/jira/browse/BEAM-2398
> Project: Beam
> Issue Type: Bug
> Components: runner-direct
> Affects Versions: 2.0.0
> Reporter: Antony Mayi
> Assignee: Thomas Groh
> Attachments: LatencyTest.java
>
>
> Over the time the end-to-end latency of a pipeline running on DirectRunner is significantly increasing.
> This is caused by ever growing sets of:
> * {{WatermarkManager.TransformWatermarks.inputWatermark.pendingElements}}
> * {{WatermarkManager.TransformWatermarks.synchronizedProcessingInputWatermark.pendingBundles}}
> That means calls to {{WatermarkManager.TransformWatermarks.refresh()}} which need to iterate through that collections take longer and longer and the latency is growing.
> I believe it is the line {{WaterMark.updatePending()}} line:
> {quote}
> if (input != null) {
> // Add the unprocessed inputs
> completedTransform.addPending(result.getUnprocessedInputs());
> {quote}
> that's adding the items that are never removed.
> See attached [^LatencyTest.java] demo code showing the increasing latency.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)