You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2021/09/14 23:15:00 UTC

[jira] [Commented] (IMPALA-10669) Loading nested ORC data is flaky during Docker-based tests

    [ https://issues.apache.org/jira/browse/IMPALA-10669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17415223#comment-17415223 ] 

Quanlong Huang commented on IMPALA-10669:
-----------------------------------------

Also saw this in a build: https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/14799
{code}
Loading nested orc data (logging to /home/ubuntu/Impala/logs/data_loading/load-nested.log)... 
    FAILED (Took: 2 min 16 sec)
    '/home/ubuntu/Impala/testdata/bin/load_nested.py -t tpch_nested_orc_def -f orc/def' failed. Tail of log:
2021-09-14 18:53:02,523 INFO:load_nested[348]:Executing: 

      CREATE EXTERNAL TABLE supplier
      STORED AS orc
      TBLPROPERTIES('orc.compress' = 'ZLIB','external.table.purge'='TRUE')
      AS SELECT * FROM tmp_supplier
Traceback (most recent call last):
  File "/home/ubuntu/Impala/testdata/bin/load_nested.py", line 415, in <module>
    load()
  File "/home/ubuntu/Impala/testdata/bin/load_nested.py", line 349, in load
    hive.execute(stmt)
  File "/home/ubuntu/Impala/tests/comparison/db_connection.py", line 206, in execute
    return self._cursor.execute(sql, *args, **kwargs)
  File "/home/ubuntu/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py", line 343, in execute
    self._wait_to_finish()  # make execute synchronous
  File "/home/ubuntu/Impala/infra/python/env-gcc7.5.0/lib/python2.7/site-packages/impala/hiveserver2.py", line 427, in _wait_to_finish
    raise OperationalError(resp.errorMessage)
impala.error.OperationalError: Error while compiling statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1631643014075_0039_2_00, diagnostics=[Task failed, taskId=task_1631643014075_0039_2_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Container container_1631643014075_0039_01_000002 finished with diagnostics set to [Container failed, exitCode=-104. [2021-09-14 18:53:07.711]Container [pid=106725,containerID=container_1631643014075_0039_01_000002] is running 14094336B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1631643014075_0039_01_000002 :
	|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
	|- 106725 106723 106725 106725 (bash) 0 0 11546624 742 /bin/bash -c /usr/lib/jvm/java-8-openjdk-amd64/bin/java  -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN  -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000002 -Dtez.root.logger=INFO,CLA  -Djava.io.tmpdir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1631643014075_0039/container_1631643014075_0039_01_000002/tmp org.apache.tez.runtime.task.TezChild localhost 33614 container_1631643014075_0039_01_000002 application_1631643014075_0039 1 1>/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000002/stdout 2>/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000002/stderr  
	|- 106735 106725 106725 106725 (java) 1780 265 2673709056 264843 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000002 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1631643014075_0039/container_1631643014075_0039_01_000002/tmp org.apache.tez.runtime.task.TezChild localhost 33614 container_1631643014075_0039_01_000002 application_1631643014075_0039 1 

[2021-09-14 18:53:07.719]Container killed on request. Exit code is 143
[2021-09-14 18:53:07.719]Container exited with a non-zero exit code 143. 
]], TaskAttempt 1 failed, info=[Container container_1631643014075_0039_01_000003 finished with diagnostics set to [Container failed, exitCode=-104. [2021-09-14 18:53:16.803]Container [pid=106885,containerID=container_1631643014075_0039_01_000003] is running 21422080B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1631643014075_0039_01_000003 :
	|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
	|- 106885 106883 106885 106885 (bash) 0 0 11546624 752 /bin/bash -c /usr/lib/jvm/java-8-openjdk-amd64/bin/java  -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN  -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000003 -Dtez.root.logger=INFO,CLA  -Djava.io.tmpdir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1631643014075_0039/container_1631643014075_0039_01_000003/tmp org.apache.tez.runtime.task.TezChild localhost 33614 container_1631643014075_0039_01_000003 application_1631643014075_0039 1 1>/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000003/stdout 2>/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000003/stderr  
	|- 106895 106885 106885 106885 (java) 1507 224 2661277696 266622 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000003 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1631643014075_0039/container_1631643014075_0039_01_000003/tmp org.apache.tez.runtime.task.TezChild localhost 33614 container_1631643014075_0039_01_000003 application_1631643014075_0039 1 

[2021-09-14 18:53:16.811]Container killed on request. Exit code is 143
[2021-09-14 18:53:16.811]Container exited with a non-zero exit code 143. 
]], TaskAttempt 2 failed, info=[Container container_1631643014075_0039_01_000004 finished with diagnostics set to [Container failed, exitCode=-104. [2021-09-14 18:53:25.895]Container [pid=107002,containerID=container_1631643014075_0039_01_000004] is running 7229440B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1631643014075_0039_01_000004 :
	|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
	|- 107002 107000 107002 107002 (bash) 0 0 11546624 767 /bin/bash -c /usr/lib/jvm/java-8-openjdk-amd64/bin/java  -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN  -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000004 -Dtez.root.logger=INFO,CLA  -Djava.io.tmpdir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1631643014075_0039/container_1631643014075_0039_01_000004/tmp org.apache.tez.runtime.task.TezChild localhost 33614 container_1631643014075_0039_01_000004 application_1631643014075_0039 1 1>/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000004/stdout 2>/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000004/stderr  
	|- 107012 107002 107002 107002 (java) 1508 227 2689318912 263142 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000004 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1631643014075_0039/container_1631643014075_0039_01_000004/tmp org.apache.tez.runtime.task.TezChild localhost 33614 container_1631643014075_0039_01_000004 application_1631643014075_0039 1 

[2021-09-14 18:53:25.904]Container killed on request. Exit code is 143
[2021-09-14 18:53:25.904]Container exited with a non-zero exit code 143. 
]], TaskAttempt 3 failed, info=[Container container_1631643014075_0039_01_000005 finished with diagnostics set to [Container failed, exitCode=-104. [2021-09-14 18:53:34.986]Container [pid=107126,containerID=container_1631643014075_0039_01_000005] is running 12509184B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1631643014075_0039_01_000005 :
	|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
	|- 107126 107124 107126 107126 (bash) 0 0 11546624 756 /bin/bash -c /usr/lib/jvm/java-8-openjdk-amd64/bin/java  -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN  -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000005 -Dtez.root.logger=INFO,CLA  -Djava.io.tmpdir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1631643014075_0039/container_1631643014075_0039_01_000005/tmp org.apache.tez.runtime.task.TezChild localhost 33614 container_1631643014075_0039_01_000005 application_1631643014075_0039 1 1>/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000005/stdout 2>/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000005/stderr  
	|- 107136 107126 107126 107126 (java) 1509 250 2666205184 264442 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1631643014075_0039/container_1631643014075_0039_01_000005 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/home/ubuntu/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1631643014075_0039/container_1631643014075_0039_01_000005/tmp org.apache.tez.runtime.task.TezChild localhost 33614 container_1631643014075_0039_01_000005 application_1631643014075_0039 1 

[2021-09-14 18:53:34.994]Container killed on request. Exit code is 143
[2021-09-14 18:53:34.994]Container exited with a non-zero exit code 143. 
]]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1631643014075_0039_2_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
{code}

> Loading nested ORC data is flaky during Docker-based tests
> ----------------------------------------------------------
>
>                 Key: IMPALA-10669
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10669
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Infrastructure
>    Affects Versions: Impala 4.0.0
>            Reporter: Laszlo Gaal
>            Assignee: Laszlo Gaal
>            Priority: Major
>
> Docker-based tests (using {{docker/test-wirh-docker.py}} often fail in the dataload phase when trying to load ORC tables with complex types. The failure happens quite often (at least in about 50% of the runs), and when it happens, the failure pattern is quite consistent: it is always a Tez container overrunning its allotted memory.
> The signature is:
> {code}
> 2021-04-18 13:32:19.551921 [2021-04-18 13:31:51.355]Container killed on request. Exit code is 143
> 2021-04-18 13:32:19.551966 [2021-04-18 13:31:51.356]Container exited with a non-zero exit code 143. 
> 2021-04-18 13:32:19.552181 ]], TaskAttempt 1 failed, info=[Container container_1618776748992_0039_01_000003 finished with diagnostics set to [Container failed, exitCode=-104. [2021-04-18 13:32:00.379]Container [pid=11530,containerID=container_1618776748992_0039_01_000003] is running 2785280B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB physical memory used; 2.6 GB of 2.1 GB virtual memory used. Killing container.
> 2021-04-18 13:32:19.552224 Dump of the process-tree for container_1618776748992_0039_01_000003 :
> 2021-04-18 13:32:19.552298 	|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> 2021-04-18 13:32:19.552753 	|- 11540 11530 11530 11530 (java) 2048 85 2761297920 262152 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.282.b08-2.el8_3.x86_64/bin/java -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/home/impdev/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1618776748992_0039/container_1618776748992_0039_01_000003 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/home/impdev/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/impdev/nm-local-dir/usercache/impdev/appcache/application_1618776748992_0039/container_1618776748992_0039_01_000003/tmp org.apache.tez.runtime.task.TezChild localhost 38999 container_1618776748992_0039_01_000003 application_1618776748992_0039 1 
> 2021-04-18 13:32:19.553375 	|- 11530 11528 11530 11530 (bash) 0 0 10010624 672 /bin/bash -c /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.282.b08-2.el8_3.x86_64/bin/java  -Xmx819m -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN  -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/home/impdev/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1618776748992_0039/container_1618776748992_0039_01_000003 -Dtez.root.logger=INFO,CLA  -Djava.io.tmpdir=/home/impdev/Impala/testdata/cluster/cdh7/node-1/var/lib/hadoop-yarn/cache/impdev/nm-local-dir/usercache/impdev/appcache/application_1618776748992_0039/container_1618776748992_0039_01_000003/tmp org.apache.tez.runtime.task.TezChild localhost 38999 container_1618776748992_0039_01_000003 application_1618776748992_0039 1 1>/home/impdev/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1618776748992_0039/container_1618776748992_0039_01_000003/stdout 2>/home/impdev/Impala/testdata/cluster/cdh7/node-1/var/log/hadoop-yarn/containers/application_1618776748992_0039/container_1618776748992_0039_01_000003/stderr  
> {code}
> The failure has only been seen on AWS m5.12xl instances so far, which have 192GB of RAM, all of which is available to the initial container doing the compile/link and dataload phases of a test run.
> The same code runs with no problems on m5.4xl (64GB RAM) and r5.4xl (128GB RAM) instances during other build jobs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org