You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Hannah Jiang (Jira)" <ji...@apache.org> on 2019/09/01 04:30:00 UTC

[jira] [Comment Edited] (BEAM-7993) portable python precommit is flaky

    [ https://issues.apache.org/jira/browse/BEAM-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920279#comment-16920279 ] 

Hannah Jiang edited comment on BEAM-7993 at 9/1/19 4:29 AM:
------------------------------------------------------------

Thanks Mark for trying it out.

I analyzed last 60 Python Portable Precommit tests from Jenkins to see if I can find some patterns. There was no obvious patterns, however, all tests submitted to Jenkins agent12 failed. It's also more likely to fail if we parallel run multiple Python Portable Precommit tests on the same agent, which means we may be running multiple sdist parallel in this case(It's depending on test start time, sdist running at first part, so if two test start time is close, they may run sdist parallel.). However, even if we run only one Python Portable Precommit test in an agent the failure still happens, so it's not enough to say parallel running sdist caused this issue. However, it's still worth to check it out.

I attached a list of these 60 tests in case it can help you any way. 
[^Python_Portable_Precommit.pdf] 


was (Author: hannahjiang):
Thanks Mark for trying it out.

I listed last 60 Python Portable Precommit tests from Jenkins to see if I can find some patterns. There was no obvious patterns, however, all tests submitted to Jenkins agent12 failed. It's also more likely to fail if we parallel run multiple Python Portable Precommit tests on the same agent, which means we are running sdist parallel in this case. However, even if we run only one Python Portable Precommit test in an agent the failure still happens, so it's not enough to say parallel running sdist caused this issue. However, it's still worth to check it out.

I attached a list of these 60 tests in case it can help you any way. 
[^Python_Portable_Precommit.pdf] 

> portable python precommit is flaky
> ----------------------------------
>
>                 Key: BEAM-7993
>                 URL: https://issues.apache.org/jira/browse/BEAM-7993
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core, test-failures, testing
>    Affects Versions: 2.15.0
>            Reporter: Udi Meiri
>            Assignee: Mark Liu
>            Priority: Major
>              Labels: currently-failing
>             Fix For: 2.16.0
>
>         Attachments: Python_Portable_Precommit.pdf
>
>          Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> I'm not sure what the root cause is here.
> Example log where :sdks:python:test-suites:portable:py35:portableWordCountBatch failed:
> {code}
> 11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)] ERROR org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)
> 11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)] ERROR org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)
> 11:51:22 [CHAIN MapPartition (MapPartition at [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>), Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)] ERROR org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN MapPartition (MapPartition at [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>), Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)
> 11:51:22 [CHAIN MapPartition (MapPartition at [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>), Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)] ERROR org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN MapPartition (MapPartition at [2]write/Write/WriteImpl/DoOnce/{FlatMap(<lambda at core.py:2457>), Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)
> 11:51:22 java.lang.Exception: The user defined 'open()' method caused an exception: java.io.IOException: Received exit code 1 for command 'docker inspect -f {{.State.Running}} 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr: Error: No such object: 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1
> 11:51:22 	at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498)
> 11:51:22 	at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
> 11:51:22 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:712)
> 11:51:22 	at java.lang.Thread.run(Thread.java:748)
> 11:51:22 Caused by: org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: java.io.IOException: Received exit code 1 for command 'docker inspect -f {{.State.Running}} 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr: Error: No such object: 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1
> 11:51:22 	at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4966)
> 11:51:22 	at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:211)
> 11:51:22 	at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:202)
> 11:51:22 	at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:185)
> 11:51:22 	at org.apache.beam.runners.flink.translation.functions.FlinkDefaultExecutableStageContext.getStageBundleFactory(FlinkDefaultExecutableStageContext.java:49)
> 11:51:22 	at org.apache.beam.runners.flink.translation.functions.ReferenceCountingFlinkExecutableStageContextFactory$WrappedContext.getStageBundleFactory(ReferenceCountingFlinkExecutableStageContextFactory.java:203)
> 11:51:22 	at org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.open(FlinkExecutableStageFunction.java:129)
> 11:51:22 	at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
> 11:51:22 	at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:494)
> 11:51:22 	... 3 more
> {code}
> https://builds.apache.org/job/beam_PreCommit_Portable_Python_Commit/5512/consoleFull



--
This message was sent by Atlassian Jira
(v8.3.2#803003)