You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Michael Smith (Jira)" <ji...@apache.org> on 2022/06/15 17:30:00 UTC

[jira] [Resolved] (IMPALA-10669) Loading nested ORC data is flaky during Docker-based tests

     [ https://issues.apache.org/jira/browse/IMPALA-10669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Smith resolved IMPALA-10669.
------------------------------------
    Resolution: Duplicate

> Loading nested ORC data is flaky during Docker-based tests
> ----------------------------------------------------------
>
>                 Key: IMPALA-10669
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10669
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Infrastructure
>    Affects Versions: Impala 4.0.0
>            Reporter: Laszlo Gaal
>            Assignee: Laszlo Gaal
>            Priority: Major
>
> Docker-based tests (using {{docker/test-wirh-docker.py}} often fail in the dataload phase when trying to load ORC tables with complex types. The failure happens quite often (at least in about 50% of the runs), and when it happens, the failure pattern is quite consistent: it is always a Tez container overrunning its allotted memory.
> The signature is:
> {code}
> 2021-04-18 13:32:19.551921 [2021-04-18 13:31:51.355]Container killed on request. Exit code is 143
> 2021-04-18 13:32:19.551966 [2021-04-18 13:31:51.356]Container exited with a non-zero exit code 143. 
> 2021-04-18 13:32:19.552181 ]], TaskAttempt 1 failed, info=[Container container_1618776748992_0039_01_000003 finished with diagnostics set to [Container failed, exitCode=-104. [2021-04-18 13:32:00.379]Container [pid=11530,containerID=container_1618776748992_0039_01_000003] is running 2785280B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB physical memory used; 2.6 GB of 2.1 GB virtual memory used. Killing container.
> 2021-04-18 13:32:19.552224 Dump of the process-tree for container_1618776748992_0039_01_000003 :
> 2021-04-18 13:32:19.552298 	|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> 2021-04-18 13:32:19.552753 	|- 11540 11530 11530 11530 (java) 2048 85 2761297920 262152 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.282.b08-2.el8_3.x86_64/bin/java -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/home/impdev/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1618776748992_0039/container_1618776748992_0039_01_000003 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/home/impdev/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/impdev/nm-local-dir/usercache/impdev/appcache/application_1618776748992_0039/container_1618776748992_0039_01_000003/tmp org.apache.tez.runtime.task.TezChild localhost 38999 container_1618776748992_0039_01_000003 application_1618776748992_0039 1 
> 2021-04-18 13:32:19.553375 	|- 11530 11528 11530 11530 (bash) 0 0 10010624 672 /bin/bash -c /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.282.b08-2.el8_3.x86_64/bin/java  -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN  -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/home/impdev/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1618776748992_0039/container_1618776748992_0039_01_000003 -Dtez.root.logger=INFO,CLA  -Djava.io.tmpdir=/home/impdev/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/impdev/nm-local-dir/usercache/impdev/appcache/application_1618776748992_0039/container_1618776748992_0039_01_000003/tmp org.apache.tez.runtime.task.TezChild localhost 38999 container_1618776748992_0039_01_000003 application_1618776748992_0039 1 1>/home/impdev/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1618776748992_0039/container_1618776748992_0039_01_000003/stdout 2>/home/impdev/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1618776748992_0039/container_1618776748992_0039_01_000003/stderr  
> {code}
> The failure has only been seen on AWS m5.12xl instances so far, which have 192GB of RAM, all of which is available to the initial container doing the compile/link and dataload phases of a test run.
> The same code runs with no problems on m5.4xl (64GB RAM) and r5.4xl (128GB RAM) instances during other build jobs.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org