You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Janek Bevendorff <ja...@uni-weimar.de> on 2022/01/27 14:40:18 UTC
Python pipeline on Flink not finishing cleanly
Hi,
When I run a Python pipeline with multiple concurrent TaskManagers on
Flink, the job hardly ever (or never) finishes properly. At the end,
Beam (or Flink?) always throws a seemingly random gRPC
IllegalStateException after my last GlobalCombine, so Beam goes into
some weird error handling mode and eventually fails to job, even though
it should have finished cleanly.
This is only reproducible with parallelism set to at least 5 or 8. With
1-4, I cannot reliably (or at all) reproduce it. It looks like a similar
issue has already been reported on Jira
(https://issues.apache.org/jira/browse/BEAM-8980), but it got marked as
stale. Anyone else seeing this? Is there anything I can do? I don't want
my job to restart after it's finished and I want a clean exit status,
otherwise I don't really know if everything succeeded properly and I
don't want to comb through hundreds of log files to find out.
I added a comment with the stacktraces that I get below the
above-mentioned issue:
https://issues.apache.org/jira/browse/BEAM-8980?focusedCommentId=17483174&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17483174
Janek