You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Mark Liu (JIRA)" <ji...@apache.org> on 2018/08/08 16:34:00 UTC

[jira] [Created] (BEAM-5108) Python test framework should prevent streaming pipeline leaks

Mark Liu created BEAM-5108:
------------------------------

             Summary: Python test framework should prevent streaming pipeline leaks
                 Key: BEAM-5108
                 URL: https://issues.apache.org/jira/browse/BEAM-5108
             Project: Beam
          Issue Type: Task
          Components: testing
            Reporter: Mark Liu


Recently, few Python streaming pipelines on Dataflow apache-beam-testing project run for more than 5 days. This look like a leaking from Jenkins job that runs e2e integration tests.

Test framework has a pipeline resource clean up and applies to all integration test, which is defined in [TestDataflowRunner|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py#L67]. However, the cancellation may failed in a special case, like following (from [this Jenkins run|https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Python_Verify/5636/consoleFull]):
{quote}
Workflow modification failed. Causes: (c53cc746f7bc7f49): Operation cancel not allowed for job 2018-08-01_13_10_24-5019826606522054507. Job is not yet ready for canceling. Please retry in a few minutes.
{quote}

Two possible approaches to improve test infra:
1. Add retry to the framework cancellation.
2. Instead of wait until pipeline in RUNNING state ([here|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py#L57]), we want to wait more to make sure worker pool starts successfully.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)