You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Hannah Jiang (Jira)" <ji...@apache.org> on 2019/09/01 04:28:00 UTC

[jira] [Commented] (BEAM-7993) portable python precommit is flaky

    [ https://issues.apache.org/jira/browse/BEAM-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920279#comment-16920279 ] 

Hannah Jiang commented on BEAM-7993:
------------------------------------

Thanks Mark for trying it out.

I listed last 60 Python Portable Precommit tests from Jenkins to see if I can find some patterns. There was no obvious patterns, however, all tests submitted to Jenkins agent12 failed. It's also more likely to fail if we parallel run multiple Python Portable Precommit tests on the same agent, which means we are running sdist parallel in this case. However, even if we run only one Python Portable Precommit test in an agent the failure still happens, so it's not enough to say parallel running sdist caused this issue. However, it's still worth to check it out.

I attached a list of these 60 tests in case it can help you any way. 
[^Python_Portable_Precommit.pdf] 

> portable python precommit is flaky
> ----------------------------------
>
>                 Key: BEAM-7993
>                 URL: https://issues.apache.org/jira/browse/BEAM-7993
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core, test-failures, testing
>    Affects Versions: 2.15.0
>            Reporter: Udi Meiri
>            Assignee: Mark Liu
>            Priority: Major
>              Labels: currently-failing
>             Fix For: 2.16.0
>
>         Attachments: Python_Portable_Precommit.pdf
>
>          Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> I'm not sure what the root cause is here.
> Example log where :sdks:python:test-suites:portable:py35:portableWordCountBatch failed:
> {code}
> 11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)] ERROR org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)
> 11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)] ERROR org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)
> 11:51:22 [CHAIN MapPartition (MapPartition at [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>), Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)] ERROR org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN MapPartition (MapPartition at [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>), Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)
> 11:51:22 [CHAIN MapPartition (MapPartition at [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>), Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)] ERROR org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN MapPartition (MapPartition at [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>), Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)
> 11:51:22 java.lang.Exception: The user defined 'open()' method caused an exception: java.io.IOException: Received exit code 1 for command 'docker inspect -f {{.State.Running}} 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr: Error: No such object: 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1
> 11:51:22 	at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498)
> 11:51:22 	at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
> 11:51:22 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:712)
> 11:51:22 	at java.lang.Thread.run(Thread.java:748)
> 11:51:22 Caused by: org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: java.io.IOException: Received exit code 1 for command 'docker inspect -f {{.State.Running}} 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr: Error: No such object: 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1
> 11:51:22 	at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4966)
> 11:51:22 	at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:211)
> 11:51:22 	at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:202)
> 11:51:22 	at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:185)
> 11:51:22 	at org.apache.beam.runners.flink.translation.functions.FlinkDefaultExecutableStageContext.getStageBundleFactory(FlinkDefaultExecutableStageContext.java:49)
> 11:51:22 	at org.apache.beam.runners.flink.translation.functions.ReferenceCountingFlinkExecutableStageContextFactory$WrappedContext.getStageBundleFactory(ReferenceCountingFlinkExecutableStageContextFactory.java:203)
> 11:51:22 	at org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.open(FlinkExecutableStageFunction.java:129)
> 11:51:22 	at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
> 11:51:22 	at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:494)
> 11:51:22 	... 3 more
> {code}
> https://builds.apache.org/job/beam_PreCommit_Portable_Python_Commit/5512/consoleFull



--
This message was sent by Atlassian Jira
(v8.3.2#803003)