You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "László Bodor (Jira)" <ji...@apache.org> on 2021/02/03 09:32:00 UTC

[jira] [Comment Edited] (TEZ-4273) Clear off staging files when TezYarnClient is unable to submit applications

    [ https://issues.apache.org/jira/browse/TEZ-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277842#comment-17277842 ] 

László Bodor edited comment on TEZ-4273 at 2/3/21, 9:31 AM:
------------------------------------------------------------

looks like it still leaks some folder that we need to take care of

1. set default queue to 0%
2. started hiveserver2
3. cannot submit to default queue and keeps trying

{code}
2021-02-03 09:14:15,225 INFO  org.apache.tez.client.TezClient: [main]: Staging dir hdfs://lbodor-hiveontez-3.lbodor-hiveontez.root.hwx.site:8020/tmp/hive/hive/_tez_session_dir/3f09b6c8-c4f7-48b4-b371-d46cb0ecb182/.tez/application_1607511534794_0015, deleted:true
...
2021-02-03 09:15:17,440 INFO  org.apache.tez.client.TezClient: [main]: Staging dir hdfs://lbodor-hiveontez-3.lbodor-hiveontez.root.hwx.site:8020/tmp/hive/hive/_tez_session_dir/44b2de9a-bb91-410b-85d6-cd17656bb005/.tez/application_1607511534794_0016, deleted:true
...
2021-02-03 09:16:19,592 INFO  org.apache.tez.client.TezClient: [main]: Staging dir hdfs://lbodor-hiveontez-3.lbodor-hiveontez.root.hwx.site:8020/tmp/hive/hive/_tez_session_dir/73b40221-4f67-41ec-b814-5a5cc49a1e95/.tez/application_1607511534794_0017, deleted:true
...
2021-02-03 09:17:21,702 INFO  org.apache.tez.client.TezClient: [main]: Staging dir hdfs://lbodor-hiveontez-3.lbodor-hiveontez.root.hwx.site:8020/tmp/hive/hive/_tez_session_dir/e066e484-23eb-4ff2-9cef-3e96f1a06d8d/.tez/application_1607511534794_0018, deleted:true
...
{code}

leftovers:
{code}
drwx------   - hive supergroup          0 2021-02-03 09:14 /tmp/hive/hive/_tez_session_dir/3f09b6c8-c4f7-48b4-b371-d46cb0ecb182
drwx------   - hive supergroup          0 2021-02-03 09:14 /tmp/hive/hive/_tez_session_dir/3f09b6c8-c4f7-48b4-b371-d46cb0ecb182/.tez
drwx------   - hive supergroup          0 2021-02-03 09:14 /tmp/hive/hive/_tez_session_dir/3f09b6c8-c4f7-48b4-b371-d46cb0ecb182-resources
-rw-r--r--   2 hive supergroup   45489914 2021-02-03 09:14 /tmp/hive/hive/_tez_session_dir/3f09b6c8-c4f7-48b4-b371-d46cb0ecb182-resources/hive-exec-3.1.3000.7.1.5.0-2
57.jar
-rw-r--r--   2 hive supergroup   45489914 2021-02-03 09:14 /tmp/hive/hive/_tez_session_dir/3f09b6c8-c4f7-48b4-b371-d46cb0ecb182-resources/hive-exec.jar
drwx------   - hive supergroup          0 2021-02-03 09:15 /tmp/hive/hive/_tez_session_dir/44b2de9a-bb91-410b-85d6-cd17656bb005
drwx------   - hive supergroup          0 2021-02-03 09:15 /tmp/hive/hive/_tez_session_dir/44b2de9a-bb91-410b-85d6-cd17656bb005/.tez
drwx------   - hive supergroup          0 2021-02-03 09:15 /tmp/hive/hive/_tez_session_dir/44b2de9a-bb91-410b-85d6-cd17656bb005-resources
-rw-r--r--   2 hive supergroup   45489914 2021-02-03 09:15 /tmp/hive/hive/_tez_session_dir/44b2de9a-bb91-410b-85d6-cd17656bb005-resources/hive-exec-3.1.3000.7.1.5.0-2
57.jar
-rw-r--r--   2 hive supergroup   45489914 2021-02-03 09:15 /tmp/hive/hive/_tez_session_dir/44b2de9a-bb91-410b-85d6-cd17656bb005-resources/hive-exec.jar
drwx------   - hive supergroup          0 2021-02-03 09:16 /tmp/hive/hive/_tez_session_dir/73b40221-4f67-41ec-b814-5a5cc49a1e95
drwx------   - hive supergroup          0 2021-02-03 09:16 /tmp/hive/hive/_tez_session_dir/73b40221-4f67-41ec-b814-5a5cc49a1e95/.tez
drwx------   - hive supergroup          0 2021-02-03 09:16 /tmp/hive/hive/_tez_session_dir/73b40221-4f67-41ec-b814-5a5cc49a1e95-resources
-rw-r--r--   2 hive supergroup   45489914 2021-02-03 09:16 /tmp/hive/hive/_tez_session_dir/73b40221-4f67-41ec-b814-5a5cc49a1e95-resources/hive-exec-3.1.3000.7.1.5.0-2
57.jar
-rw-r--r--   2 hive supergroup   45489914 2021-02-03 09:16 /tmp/hive/hive/_tez_session_dir/73b40221-4f67-41ec-b814-5a5cc49a1e95-resources/hive-exec.jar
drwx------   - hive supergroup          0 2021-02-03 09:18 /tmp/hive/hive/_tez_session_dir/a2e8e411-ebbf-4483-94e0-182fba45396e
drwx------   - hive supergroup          0 2021-02-03 09:18 /tmp/hive/hive/_tez_session_dir/a2e8e411-ebbf-4483-94e0-182fba45396e/.tez
drwx------   - hive supergroup          0 2021-02-03 09:18 /tmp/hive/hive/_tez_session_dir/a2e8e411-ebbf-4483-94e0-182fba45396e-resources
-rw-r--r--   2 hive supergroup   45489914 2021-02-03 09:18 /tmp/hive/hive/_tez_session_dir/a2e8e411-ebbf-4483-94e0-182fba45396e-resources/hive-exec-3.1.3000.7.1.5.0-2
57.jar
-rw-r--r--   2 hive supergroup   45489914 2021-02-03 09:18 /tmp/hive/hive/_tez_session_dir/a2e8e411-ebbf-4483-94e0-182fba45396e-resources/hive-exec.jar
drwx------   - hive supergroup          0 2021-02-03 09:17 /tmp/hive/hive/_tez_session_dir/e066e484-23eb-4ff2-9cef-3e96f1a06d8d
drwx------   - hive supergroup          0 2021-02-03 09:17 /tmp/hive/hive/_tez_session_dir/e066e484-23eb-4ff2-9cef-3e96f1a06d8d/.tez
drwx------   - hive supergroup          0 2021-02-03 09:17 /tmp/hive/hive/_tez_session_dir/e066e484-23eb-4ff2-9cef-3e96f1a06d8d-resources
-rw-r--r--   2 hive supergroup   45489914 2021-02-03 09:17 /tmp/hive/hive/_tez_session_dir/e066e484-23eb-4ff2-9cef-3e96f1a06d8d-resources/hive-exec-3.1.3000.7.1.5.0-2
57.jar
-rw-r--r--   2 hive supergroup   45489914 2021-02-03 09:17 /tmp/hive/hive/_tez_session_dir/e066e484-23eb-4ff2-9cef-3e96f1a06d8d-resources/hive-exec.jar
{code}

seems like "TezCommonUtils.getTezSystemStagingPath" is not the perfect one here


was (Author: abstractdog):
looks like it still leaks some folder that we need to take care of

1. set default queue to 0%
2. started hiveserver2

{code}
2021-02-03 09:14:15,225 INFO  org.apache.tez.client.TezClient: [main]: Staging dir hdfs://lbodor-hiveontez-3.lbodor-hiveontez.root.hwx.site:8020/tmp/hive/hive/_tez_session_dir/3f09b6c8-c4f7-48b4-b371-d46cb0ecb182/.tez/application_1607511534794_0015, deleted:true
...
2021-02-03 09:15:17,440 INFO  org.apache.tez.client.TezClient: [main]: Staging dir hdfs://lbodor-hiveontez-3.lbodor-hiveontez.root.hwx.site:8020/tmp/hive/hive/_tez_session_dir/44b2de9a-bb91-410b-85d6-cd17656bb005/.tez/application_1607511534794_0016, deleted:true
...
2021-02-03 09:16:19,592 INFO  org.apache.tez.client.TezClient: [main]: Staging dir hdfs://lbodor-hiveontez-3.lbodor-hiveontez.root.hwx.site:8020/tmp/hive/hive/_tez_session_dir/73b40221-4f67-41ec-b814-5a5cc49a1e95/.tez/application_1607511534794_0017, deleted:true
...
2021-02-03 09:17:21,702 INFO  org.apache.tez.client.TezClient: [main]: Staging dir hdfs://lbodor-hiveontez-3.lbodor-hiveontez.root.hwx.site:8020/tmp/hive/hive/_tez_session_dir/e066e484-23eb-4ff2-9cef-3e96f1a06d8d/.tez/application_1607511534794_0018, deleted:true
...
{code}

leftovers:
{code}
drwx------   - hive supergroup          0 2021-02-03 09:14 /tmp/hive/hive/_tez_session_dir/3f09b6c8-c4f7-48b4-b371-d46cb0ecb182
drwx------   - hive supergroup          0 2021-02-03 09:14 /tmp/hive/hive/_tez_session_dir/3f09b6c8-c4f7-48b4-b371-d46cb0ecb182/.tez
drwx------   - hive supergroup          0 2021-02-03 09:14 /tmp/hive/hive/_tez_session_dir/3f09b6c8-c4f7-48b4-b371-d46cb0ecb182-resources
-rw-r--r--   2 hive supergroup   45489914 2021-02-03 09:14 /tmp/hive/hive/_tez_session_dir/3f09b6c8-c4f7-48b4-b371-d46cb0ecb182-resources/hive-exec-3.1.3000.7.1.5.0-2
57.jar
-rw-r--r--   2 hive supergroup   45489914 2021-02-03 09:14 /tmp/hive/hive/_tez_session_dir/3f09b6c8-c4f7-48b4-b371-d46cb0ecb182-resources/hive-exec.jar
drwx------   - hive supergroup          0 2021-02-03 09:15 /tmp/hive/hive/_tez_session_dir/44b2de9a-bb91-410b-85d6-cd17656bb005
drwx------   - hive supergroup          0 2021-02-03 09:15 /tmp/hive/hive/_tez_session_dir/44b2de9a-bb91-410b-85d6-cd17656bb005/.tez
drwx------   - hive supergroup          0 2021-02-03 09:15 /tmp/hive/hive/_tez_session_dir/44b2de9a-bb91-410b-85d6-cd17656bb005-resources
-rw-r--r--   2 hive supergroup   45489914 2021-02-03 09:15 /tmp/hive/hive/_tez_session_dir/44b2de9a-bb91-410b-85d6-cd17656bb005-resources/hive-exec-3.1.3000.7.1.5.0-2
57.jar
-rw-r--r--   2 hive supergroup   45489914 2021-02-03 09:15 /tmp/hive/hive/_tez_session_dir/44b2de9a-bb91-410b-85d6-cd17656bb005-resources/hive-exec.jar
drwx------   - hive supergroup          0 2021-02-03 09:16 /tmp/hive/hive/_tez_session_dir/73b40221-4f67-41ec-b814-5a5cc49a1e95
drwx------   - hive supergroup          0 2021-02-03 09:16 /tmp/hive/hive/_tez_session_dir/73b40221-4f67-41ec-b814-5a5cc49a1e95/.tez
drwx------   - hive supergroup          0 2021-02-03 09:16 /tmp/hive/hive/_tez_session_dir/73b40221-4f67-41ec-b814-5a5cc49a1e95-resources
-rw-r--r--   2 hive supergroup   45489914 2021-02-03 09:16 /tmp/hive/hive/_tez_session_dir/73b40221-4f67-41ec-b814-5a5cc49a1e95-resources/hive-exec-3.1.3000.7.1.5.0-2
57.jar
-rw-r--r--   2 hive supergroup   45489914 2021-02-03 09:16 /tmp/hive/hive/_tez_session_dir/73b40221-4f67-41ec-b814-5a5cc49a1e95-resources/hive-exec.jar
drwx------   - hive supergroup          0 2021-02-03 09:18 /tmp/hive/hive/_tez_session_dir/a2e8e411-ebbf-4483-94e0-182fba45396e
drwx------   - hive supergroup          0 2021-02-03 09:18 /tmp/hive/hive/_tez_session_dir/a2e8e411-ebbf-4483-94e0-182fba45396e/.tez
drwx------   - hive supergroup          0 2021-02-03 09:18 /tmp/hive/hive/_tez_session_dir/a2e8e411-ebbf-4483-94e0-182fba45396e-resources
-rw-r--r--   2 hive supergroup   45489914 2021-02-03 09:18 /tmp/hive/hive/_tez_session_dir/a2e8e411-ebbf-4483-94e0-182fba45396e-resources/hive-exec-3.1.3000.7.1.5.0-2
57.jar
-rw-r--r--   2 hive supergroup   45489914 2021-02-03 09:18 /tmp/hive/hive/_tez_session_dir/a2e8e411-ebbf-4483-94e0-182fba45396e-resources/hive-exec.jar
drwx------   - hive supergroup          0 2021-02-03 09:17 /tmp/hive/hive/_tez_session_dir/e066e484-23eb-4ff2-9cef-3e96f1a06d8d
drwx------   - hive supergroup          0 2021-02-03 09:17 /tmp/hive/hive/_tez_session_dir/e066e484-23eb-4ff2-9cef-3e96f1a06d8d/.tez
drwx------   - hive supergroup          0 2021-02-03 09:17 /tmp/hive/hive/_tez_session_dir/e066e484-23eb-4ff2-9cef-3e96f1a06d8d-resources
-rw-r--r--   2 hive supergroup   45489914 2021-02-03 09:17 /tmp/hive/hive/_tez_session_dir/e066e484-23eb-4ff2-9cef-3e96f1a06d8d-resources/hive-exec-3.1.3000.7.1.5.0-2
57.jar
-rw-r--r--   2 hive supergroup   45489914 2021-02-03 09:17 /tmp/hive/hive/_tez_session_dir/e066e484-23eb-4ff2-9cef-3e96f1a06d8d-resources/hive-exec.jar
{code}

seems like "TezCommonUtils.getTezSystemStagingPath" is not the perfect one here

> Clear off staging files when TezYarnClient is unable to submit applications
> ---------------------------------------------------------------------------
>
>                 Key: TEZ-4273
>                 URL: https://issues.apache.org/jira/browse/TEZ-4273
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Priority: Major
>         Attachments: TEZ-4273.1.patch
>
>
> Currently it leaves behind few resources like "tez-conf.pb" etc when exception is encountered during app submission. This causes issues in cluster, when apps continue to submit queries continuously.
> {noformat}
> drwx------   - hive supergroup          0 2021-01-28 01:58 /tmp/hive/hive/_tez_session_dir/806cd302-abd0-4694-be4a-b2b2473e75a8/.tez/application_1611791897439_0042
> -rw-r--r--   3 hive supergroup     135519 2021-01-28 01:58 /tmp/hive/hive/_tez_session_dir/806cd302-abd0-4694-be4a-b2b2473e75a8/.tez/application_1611791897439_0042/tez-conf.pb
> -rw-r--r--   3 hive supergroup       1056 2021-01-28 01:58 /tmp/hive/hive/_tez_session_dir/806cd302-abd0-4694-be4a-b2b2473e75a8/.tez/application_1611791897439_0042/tez.session.local-resources.pb	
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1611791897439_0042 to YARN : org.apache.hadoop.security.AccessControlException: Queue root.default already has X applications from user hive cannot accept submission of application: application_1611791897439_0042
> 	at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:322) ~[hadoop-yarn-client-3.x:?]
> 	at org.apache.tez.client.TezYarnClient.submitApplication(TezYarnClient.java:77) ~[tez-api-0.9.x.jar:0.9.x]
> 	at org.apache.tez.client.TezClient.start(TezClient.java:405) ~[tez-api-0.9.x.jar:0.9.x]
> 	at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.startSessionAndContainers(TezSessionState.java:535) ~[hive-exec-3.x]{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)