You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Matthias Pohl (Jira)" <ji...@apache.org> on 2023/03/01 09:17:00 UTC
[jira] [Comment Edited] (FLINK-31278) exit code 137 (i.e. OutOfMemoryError) in core module
[ https://issues.apache.org/jira/browse/FLINK-31278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694943#comment-17694943 ]
Matthias Pohl edited comment on FLINK-31278 at 3/1/23 9:16 AM:
---------------------------------------------------------------
There is no heapdump provide due to a failure in the upload step. I extracted the tests that where running while the error happened based on the Maven output:
{code}
$ grep -e " Tests run: " -e "\[INFO\] Running" 20230301.3.txt | grep -o "org.apache.flink.[a-zA-Z\.]*" | sort | uniq -c | sort -n | head -5
1 org.apache.flink.runtime.dispatcher.MemoryExecutionGraphInfoStoreTest
1 org.apache.flink.runtime.io.disk.ChannelViewsTest
1 org.apache.flink.runtime.io.disk.FileChannelManagerImplTest
1 org.apache.flink.runtime.io.disk.iomanager.AsynchronousFileIOChannelTest
2 org.apache.flink.api.common.accumulators.AverageAccumulatorTest
{code}
Although, that's not necessarily an indication for the cause.
We see that {{ChannelViewsTest}} operates for a bit longer than the rest before the error occurs:
{code}
2023-03-01T05:28:56.0284123Z Mar 01 05:28:56 [INFO] Running org.apache.flink.runtime.io.disk.ChannelViewsTest
2023-03-01T05:29:03.2024639Z Mar 01 05:29:03 [INFO] Running org.apache.flink.runtime.io.disk.FileChannelManagerImplTest
2023-03-01T05:29:03.8510602Z Mar 01 05:29:03 [INFO] Running org.apache.flink.runtime.io.disk.iomanager.AsynchronousFileIOChannelTest
2023-03-01T05:29:20.9205409Z Mar 01 05:29:08 [INFO] Running org.apache.flink.runtime.dispatcher.MemoryExecutionGraphInfoStoreTest
{code}
was (Author: mapohl):
There is no heapdump provide due to a failure in the upload step. I extracted the tests that where running while the error happened based on the Maven output:
{code}
$ grep -e " Tests run: " -e "\[INFO\] Running" 20230301.3.txt | grep -o "org.apache.flink.[a-zA-Z\.]*" | sort | uniq -c | sort -n | head -5
1 org.apache.flink.runtime.dispatcher.MemoryExecutionGraphInfoStoreTest
1 org.apache.flink.runtime.io.disk.ChannelViewsTest
1 org.apache.flink.runtime.io.disk.FileChannelManagerImplTest
1 org.apache.flink.runtime.io.disk.iomanager.AsynchronousFileIOChannelTest
2 org.apache.flink.api.common.accumulators.AverageAccumulatorTest
{code}
Although, that's not necessarily an indication for the cause.
> exit code 137 (i.e. OutOfMemoryError) in core module
> ----------------------------------------------------
>
> Key: FLINK-31278
> URL: https://issues.apache.org/jira/browse/FLINK-31278
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.17.0
> Reporter: Matthias Pohl
> Priority: Blocker
> Labels: test-stability
>
> The following build failed due to a 137 exit code indicating an OutOfMemoryError:
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46643&view=logs&j=77a9d8e1-d610-59b3-fc2a-4766541e0e33&t=125e07e7-8de0-5c6c-a541-a567415af3ef&l=7847
> {code}
> [...]
> Mar 01 05:29:06 [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.65 s - in org.apache.flink.runtime.io.compression.BlockCompressionTest
> Mar 01 05:29:06 [INFO] Running org.apache.flink.runtime.dispatcher.DispatcherCachedOperationsHandlerTest
> Mar 01 05:29:07 [INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.142 s - in org.apache.flink.runtime.dispatcher.DispatcherCachedOperationsHandlerTest
> Mar 01 05:29:08 [INFO] Running org.apache.flink.runtime.dispatcher.MemoryExecutionGraphInfoStoreTest
> ##[error]Exit code 137 returned from process: file name '/usr/bin/docker', arguments 'exec -i -u 1001 -w /home/vsts_azpcontainer 5953b171e8ed4caba7af2b326533e249211ed4dcc48640edb3c1b0cbbcdf1a21 /__a/externals/node/bin/node /__w/_temp/containerHandlerInvoker.js'.
> Finishing: Test - core
> {code}
> This build ran on an Azure pipeline machine (Azure Pipelines 9) and, therefore, cannot be caused by FLINK-18356. That said, there was a concurrent 137 exit code build failure happening on agent "Azure Pipelines 21" (see [20230301.3|https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46643&view=logs&j=77a9d8e1-d610-59b3-fc2a-4766541e0e33&t=125e07e7-8de0-5c6c-a541-a567415af3ef&l=7847]) ~10mins later
--
This message was sent by Atlassian Jira
(v8.20.10#820010)