You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user-zh@flink.apache.org by 苏 欣 <se...@live.com> on 2019/04/09 02:36:59 UTC

回复: 答复: blink提交yarn卡在一直重复分配container


________________________________
seanlwj@live.com

发件人: 苏 欣<ma...@live.com>
发送时间: 2019-04-09 10:30
收件人: user-zh@flink.apache.org<ma...@flink.apache.org>
主题: 答复: blink提交yarn卡在一直重复分配container
不好意思,已补充yarn的日志文件。

出现问题的原因我已经找到了,在配置flink-conf.yaml中的下面三项后,会出现分配不了资源的问题
security.kerberos.login.use-ticket-cache: false
security.kerberos.login.keytab: /home/hive.keytab
security.kerberos.login.principal: hive/cdh129135@MYCDH
如果在客户机使用kinit命令后再提交,yarn资源可以正常分配。
现在我有几个问题请教大佬们:

1、 提交作业到带有kerberos认证的yarn,除了kinit方式之外还有其他方式吗,为什么读配置文件中的票据会出现code 31?

2、  taskmanager.cpu.core与slot数量在yarn上面他们是相等的吗?有没有一个core分配多个slot的情况?


发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用

发件人: Zili Chen<ma...@gmail.com>
发送时间: 2019年4月8日 19:29
收件人: user-zh@flink.apache.org<ma...@flink.apache.org>
主题: Re: blink提交yarn卡在一直重复分配container

你好,apache 的邮件列表不支持内嵌图片,请以附件或链接方式引用。

Best,
tison.


苏 欣 <se...@live.com> 于2019年4月8日周一 上午10:17写道:

> 我以per-job方式提交了一个作业到yarn上面,发现会出现不断重复分配container的现象。
>
> 现象为从yarn的web ui上看一瞬间tm的container分配成功了,但是立刻变为只剩一个jm的container,接着会继续分配tm的
> container。不断的重复这个过程直到作业调度不到资源而失败。
>
> 我查了一下exit code没找到31代表是什么意思,有没有大佬帮忙分析下,非常感谢!
>
>
>
> 发送自 Windows 10 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>应用
>
>
>


附件好像发不过去,补充部分日志//回复: 回复: blink提交yarn卡在一直重复分配container

Posted by 苏 欣 <se...@live.com>.
2019-04-09 09:58:03.012 [flink-akka.actor.default-dispatcher-3] INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph  - SourceConversion(table:[builtin, default, sourceStreamTable, source: [KafkaJsonTableSource]], fields:(TI, EV, CS_HOST, DCS_ID)) -> Calc(select: (TI, EV, DCS_ID)) -> JoinTable(table: (JDBC[int_webtrends_prd_name], schema:{RowType{, types=[IntType, StringType, StringType], fieldNames=[prd_id, prd_name, ev]}}), joinType: LeftOuterJoin, join: (TI, EV, DCS_ID),  on: (TI=prd_name, EV=ev)) -> Calc(select: (prd_id, prd_name, 'APP' AS chl_id, DCS_ID)) -> SinkConversion to Tuple2 -> Filter -> Map -> Sink: JDBCRetractTableSink(prd_id, prd_name, chl_id, DCS_ID) (2/2) - execution #0 is assigned resource container_1554366508934_0084_01_000004_3 with d5363785f9f7f1965851ba54de935e7d
2019-04-09 09:58:03.012 [flink-akka.actor.default-dispatcher-3] INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: KafkaJsonTableSource (3/4) - execution #0 is assigned resource container_1554366508934_0084_01_000004_1 with 1f1217c617a27211157883e437a6bd6e
2019-04-09 09:58:03.012 [flink-akka.actor.default-dispatcher-3] INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: KafkaJsonTableSource (2/4) - execution #0 is assigned resource container_1554366508934_0084_01_000004_4 with cd351063b4a86182ad17b7b025541bb4
2019-04-09 09:58:03.013 [flink-akka.actor.default-dispatcher-3] INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph  - SourceConversion(table:[builtin, default, sourceStreamTable, source: [KafkaJsonTableSource]], fields:(TI, EV, CS_HOST, DCS_ID)) -> Calc(select: (TI, EV, DCS_ID)) -> JoinTable(table: (JDBC[int_webtrends_prd_name], schema:{RowType{, types=[IntType, StringType, StringType], fieldNames=[prd_id, prd_name, ev]}}), joinType: LeftOuterJoin, join: (TI, EV, DCS_ID),  on: (TI=prd_name, EV=ev)) -> Calc(select: (prd_id, prd_name, 'APP' AS chl_id, DCS_ID)) -> SinkConversion to Tuple2 -> Filter -> Map -> Sink: JDBCRetractTableSink(prd_id, prd_name, chl_id, DCS_ID) (1/2) - execution #0 is assigned resource container_1554366508934_0084_01_000004_5 with e6742f700171d110fe27258bf77a789f
2019-04-09 09:58:03.013 [flink-akka.actor.default-dispatcher-3] INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: KafkaJsonTableSource (4/4) - execution #0 is assigned resource container_1554366508934_0084_01_000004_2 with 408ccaad63574e30b5fe622b87822fc4
2019-04-09 09:58:03.013 [jobmanager-future-thread-3] INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: KafkaJsonTableSource (1/4) (e972cd212cc3845b114deb5b86a229b6) switched from SCHEDULED to DEPLOYING.
2019-04-09 09:58:03.015 [jobmanager-future-thread-3] INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Deploying Source: KafkaJsonTableSource (1/4) (attempt #0) to slot container_1554366508934_0084_01_000004_0 on bd129120
2019-04-09 09:58:03.016 [jobmanager-future-thread-3] INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: KafkaJsonTableSource (2/4) (96a8bf35b6a85502b2b1afa400c2f0bd) switched from SCHEDULED to DEPLOYING.
2019-04-09 09:58:03.016 [jobmanager-future-thread-3] INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Deploying Source: KafkaJsonTableSource (2/4) (attempt #0) to slot container_1554366508934_0084_01_000004_4 on bd129120
2019-04-09 09:58:03.016 [jobmanager-future-thread-3] INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: KafkaJsonTableSource (3/4) (1f69f483771696ca5352edb9e172de6f) switched from SCHEDULED to DEPLOYING.
2019-04-09 09:58:03.017 [jobmanager-future-thread-3] INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Deploying Source: KafkaJsonTableSource (3/4) (attempt #0) to slot container_1554366508934_0084_01_000004_1 on bd129120
2019-04-09 09:58:03.017 [jobmanager-future-thread-3] INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: KafkaJsonTableSource (4/4) (8c5bfc27c50c37587d5b4439ab3a4a26) switched from SCHEDULED to DEPLOYING.
2019-04-09 09:58:03.017 [jobmanager-future-thread-3] INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Deploying Source: KafkaJsonTableSource (4/4) (attempt #0) to slot container_1554366508934_0084_01_000004_2 on bd129120
2019-04-09 09:58:03.017 [jobmanager-future-thread-3] INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph  - SourceConversion(table:[builtin, default, sourceStreamTable, source: [KafkaJsonTableSource]], fields:(TI, EV, CS_HOST, DCS_ID)) -> Calc(select: (TI, EV, DCS_ID)) -> JoinTable(table: (JDBC[int_webtrends_prd_name], schema:{RowType{, types=[IntType, StringType, StringType], fieldNames=[prd_id, prd_name, ev]}}), joinType: LeftOuterJoin, join: (TI, EV, DCS_ID),  on: (TI=prd_name, EV=ev)) -> Calc(select: (prd_id, prd_name, 'APP' AS chl_id, DCS_ID)) -> SinkConversion to Tuple2 -> Filter -> Map -> Sink: JDBCRetractTableSink(prd_id, prd_name, chl_id, DCS_ID) (1/2) (482b5c9e8f24afcac81e95372b1c02d3) switched from SCHEDULED to DEPLOYING.
2019-04-09 09:58:03.017 [jobmanager-future-thread-3] INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Deploying SourceConversion(table:[builtin, default, sourceStreamTable, source: [KafkaJsonTableSource]], fields:(TI, EV, CS_HOST, DCS_ID)) -> Calc(select: (TI, EV, DCS_ID)) -> JoinTable(table: (JDBC[int_webtrends_prd_name], schema:{RowType{, types=[IntType, StringType, StringType], fieldNames=[prd_id, prd_name, ev]}}), joinType: LeftOuterJoin, join: (TI, EV, DCS_ID),  on: (TI=prd_name, EV=ev)) -> Calc(select: (prd_id, prd_name, 'APP' AS chl_id, DCS_ID)) -> SinkConversion to Tuple2 -> Filter -> Map -> Sink: JDBCRetractTableSink(prd_id, prd_name, chl_id, DCS_ID) (1/2) (attempt #0) to slot container_1554366508934_0084_01_000004_5 on bd129120
2019-04-09 09:58:03.017 [jobmanager-future-thread-3] INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph  - SourceConversion(table:[builtin, default, sourceStreamTable, source: [KafkaJsonTableSource]], fields:(TI, EV, CS_HOST, DCS_ID)) -> Calc(select: (TI, EV, DCS_ID)) -> JoinTable(table: (JDBC[int_webtrends_prd_name], schema:{RowType{, types=[IntType, StringType, StringType], fieldNames=[prd_id, prd_name, ev]}}), joinType: LeftOuterJoin, join: (TI, EV, DCS_ID),  on: (TI=prd_name, EV=ev)) -> Calc(select: (prd_id, prd_name, 'APP' AS chl_id, DCS_ID)) -> SinkConversion to Tuple2 -> Filter -> Map -> Sink: JDBCRetractTableSink(prd_id, prd_name, chl_id, DCS_ID) (2/2) (4c0abbf6e5dc19b1bd51e650364a2c26) switched from SCHEDULED to DEPLOYING.
2019-04-09 09:58:03.018 [jobmanager-future-thread-3] INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Deploying SourceConversion(table:[builtin, default, sourceStreamTable, source: [KafkaJsonTableSource]], fields:(TI, EV, CS_HOST, DCS_ID)) -> Calc(select: (TI, EV, DCS_ID)) -> JoinTable(table: (JDBC[int_webtrends_prd_name], schema:{RowType{, types=[IntType, StringType, StringType], fieldNames=[prd_id, prd_name, ev]}}), joinType: LeftOuterJoin, join: (TI, EV, DCS_ID),  on: (TI=prd_name, EV=ev)) -> Calc(select: (prd_id, prd_name, 'APP' AS chl_id, DCS_ID)) -> SinkConversion to Tuple2 -> Filter -> Map -> Sink: JDBCRetractTableSink(prd_id, prd_name, chl_id, DCS_ID) (2/2) (attempt #0) to slot container_1554366508934_0084_01_000004_3 on bd129120
2019-04-09 09:58:04.081 [flink-akka.actor.default-dispatcher-27] INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph  - SourceConversion(table:[builtin, default, sourceStreamTable, source: [KafkaJsonTableSource]], fields:(TI, EV, CS_HOST, DCS_ID)) -> Calc(select: (TI, EV, DCS_ID)) -> JoinTable(table: (JDBC[int_webtrends_prd_name], schema:{RowType{, types=[IntType, StringType, StringType], fieldNames=[prd_id, prd_name, ev]}}), joinType: LeftOuterJoin, join: (TI, EV, DCS_ID),  on: (TI=prd_name, EV=ev)) -> Calc(select: (prd_id, prd_name, 'APP' AS chl_id, DCS_ID)) -> SinkConversion to Tuple2 -> Filter -> Map -> Sink: JDBCRetractTableSink(prd_id, prd_name, chl_id, DCS_ID) (1/2) (482b5c9e8f24afcac81e95372b1c02d3) switched from DEPLOYING to RUNNING.
2019-04-09 09:58:04.086 [flink-akka.actor.default-dispatcher-3] INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph  - SourceConversion(table:[builtin, default, sourceStreamTable, source: [KafkaJsonTableSource]], fields:(TI, EV, CS_HOST, DCS_ID)) -> Calc(select: (TI, EV, DCS_ID)) -> JoinTable(table: (JDBC[int_webtrends_prd_name], schema:{RowType{, types=[IntType, StringType, StringType], fieldNames=[prd_id, prd_name, ev]}}), joinType: LeftOuterJoin, join: (TI, EV, DCS_ID),  on: (TI=prd_name, EV=ev)) -> Calc(select: (prd_id, prd_name, 'APP' AS chl_id, DCS_ID)) -> SinkConversion to Tuple2 -> Filter -> Map -> Sink: JDBCRetractTableSink(prd_id, prd_name, chl_id, DCS_ID) (2/2) (4c0abbf6e5dc19b1bd51e650364a2c26) switched from DEPLOYING to RUNNING.
2019-04-09 09:58:04.087 [flink-akka.actor.default-dispatcher-3] INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: KafkaJsonTableSource (2/4) (96a8bf35b6a85502b2b1afa400c2f0bd) switched from DEPLOYING to RUNNING.
2019-04-09 09:58:04.089 [flink-akka.actor.default-dispatcher-3] INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: KafkaJsonTableSource (3/4) (1f69f483771696ca5352edb9e172de6f) switched from DEPLOYING to RUNNING.
2019-04-09 09:58:04.091 [flink-akka.actor.default-dispatcher-3] INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: KafkaJsonTableSource (1/4) (e972cd212cc3845b114deb5b86a229b6) switched from DEPLOYING to RUNNING.
2019-04-09 09:58:04.092 [flink-akka.actor.default-dispatcher-3] INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: KafkaJsonTableSource (4/4) (8c5bfc27c50c37587d5b4439ab3a4a26) switched from DEPLOYING to RUNNING.
2019-04-09 09:58:07.508 [AMRM Callback Handler Thread] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Container container_1554366508934_0084_01_000003 finished with exit code 31
2019-04-09 09:58:07.508 [AMRM Callback Handler Thread] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Requesting new container with resources <memory:2048, vCores:6>. Number pending requests 2.
2019-04-09 09:58:07.509 [AMRM Callback Handler Thread] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Container container_1554366508934_0084_01_000005 finished with exit code 31
2019-04-09 09:58:07.509 [AMRM Callback Handler Thread] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Requesting new container with resources <memory:2048, vCores:6>. Number pending requests 3.
2019-04-09 09:58:08.012 [AMRM Callback Handler Thread] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Received new container: container_1554366508934_0084_01_000006 - Remaining pending container requests: 2
2019-04-09 09:58:08.013 [pool-1-thread-4] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - TaskExecutor container_1554366508934_0084_01_000006 will be started with container size 2048 MB, JVM heap size 1920 MB, new generation size 480 MB, JVM direct memory limit 128 MB on cdh129139:8042
2019-04-09 09:58:08.015 [pool-1-thread-4] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Adding keytab hdfs://cdh129130:8020/user/hive/.flink/application_1554366508934_0084/hive.keytab to the AM container local resource bucket
2019-04-09 09:58:08.047 [pool-1-thread-4] INFO  org.apache.flink.yarn.Utils  - Use the beforehand copied resource hdfs://cdh129130:8020/user/hive/.flink/application_1554366508934_0084/taskmanager-conf.yaml (the corresponding local path: file:/home/yarn/nm/usercache/hive/appcache/application_1554366508934_0084/container_1554366508934_0084_01_000001/taskmanager-conf.yaml). Visibility: APPLICATION.
2019-04-09 09:58:08.157 [pool-1-thread-4] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Creating container launch context for TaskManagers
2019-04-09 09:58:08.158 [pool-1-thread-4] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Starting TaskManagers
2019-04-09 09:58:08.518 [AMRM Callback Handler Thread] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Received new container: container_1554366508934_0084_01_000007 - Remaining pending container requests: 1
2019-04-09 09:58:08.519 [pool-1-thread-3] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - TaskExecutor container_1554366508934_0084_01_000007 will be started with container size 2048 MB, JVM heap size 1920 MB, new generation size 480 MB, JVM direct memory limit 128 MB on cdh129144:8042
2019-04-09 09:58:08.520 [AMRM Callback Handler Thread] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Received new container: container_1554366508934_0084_01_000009 - Remaining pending container requests: 0
2019-04-09 09:58:08.520 [pool-1-thread-3] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Adding keytab hdfs://cdh129130:8020/user/hive/.flink/application_1554366508934_0084/hive.keytab to the AM container local resource bucket
2019-04-09 09:58:08.521 [pool-1-thread-1] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - TaskExecutor container_1554366508934_0084_01_000009 will be started with container size 2048 MB, JVM heap size 1920 MB, new generation size 480 MB, JVM direct memory limit 128 MB on cdh129136:8042
2019-04-09 09:58:08.523 [pool-1-thread-1] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Adding keytab hdfs://cdh129130:8020/user/hive/.flink/application_1554366508934_0084/hive.keytab to the AM container local resource bucket
2019-04-09 09:58:08.555 [pool-1-thread-1] INFO  org.apache.flink.yarn.Utils  - Use the beforehand copied resource hdfs://cdh129130:8020/user/hive/.flink/application_1554366508934_0084/taskmanager-conf.yaml (the corresponding local path: file:/home/yarn/nm/usercache/hive/appcache/application_1554366508934_0084/container_1554366508934_0084_01_000001/taskmanager-conf.yaml). Visibility: APPLICATION.
2019-04-09 09:58:08.555 [pool-1-thread-3] INFO  org.apache.flink.yarn.Utils  - Use the beforehand copied resource hdfs://cdh129130:8020/user/hive/.flink/application_1554366508934_0084/taskmanager-conf.yaml (the corresponding local path: file:/home/yarn/nm/usercache/hive/appcache/application_1554366508934_0084/container_1554366508934_0084_01_000001/taskmanager-conf.yaml). Visibility: APPLICATION.
2019-04-09 09:58:08.661 [pool-1-thread-1] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Creating container launch context for TaskManagers
2019-04-09 09:58:08.661 [pool-1-thread-3] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Creating container launch context for TaskManagers
2019-04-09 09:58:08.662 [pool-1-thread-1] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Starting TaskManagers
2019-04-09 09:58:08.662 [pool-1-thread-3] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Starting TaskManagers
2019-04-09 09:58:14.023 [AMRM Callback Handler Thread] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Container container_1554366508934_0084_01_000006 finished with exit code 31
2019-04-09 09:58:14.024 [AMRM Callback Handler Thread] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Requesting new container with resources <memory:2048, vCores:6>. Number pending requests 1.
2019-04-09 09:58:14.024 [AMRM Callback Handler Thread] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Container container_1554366508934_0084_01_000007 finished with exit code 31
2019-04-09 09:58:14.024 [AMRM Callback Handler Thread] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Requesting new container with resources <memory:2048, vCores:6>. Number pending requests 2.
2019-04-09 09:58:14.024 [AMRM Callback Handler Thread] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Container container_1554366508934_0084_01_000009 finished with exit code 31
2019-04-09 09:58:14.025 [AMRM Callback Handler Thread] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Requesting new container with resources <memory:2048, vCores:6>. Number pending requests 3.
2019-04-09 09:58:19.533 [AMRM Callback Handler Thread] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Received new container: container_1554366508934_0084_01_000014 - Remaining pending container requests: 2
2019-04-09 09:58:19.534 [pool-1-thread-2] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - TaskExecutor container_1554366508934_0084_01_000014 will be started with container size 2048 MB, JVM heap size 1920 MB, new generation size 480 MB, JVM direct memory limit 128 MB on cdh129139:8042
2019-04-09 09:58:19.534 [AMRM Callback Handler Thread] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Received new container: container_1554366508934_0084_01_000015 - Remaining pending container requests: 1
2019-04-09 09:58:19.535 [AMRM Callback Handler Thread] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Received new container: container_1554366508934_0084_01_000016 - Remaining pending container requests: 0
2019-04-09 09:58:19.535 [pool-1-thread-2] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Adding keytab hdfs://cdh129130:8020/user/hive/.flink/application_1554366508934_0084/hive.keytab to the AM container local resource bucket
2019-04-09 09:58:19.535 [pool-1-thread-5] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - TaskExecutor container_1554366508934_0084_01_000015 will be started with container size 2048 MB, JVM heap size 1920 MB, new generation size 480 MB, JVM direct memory limit 128 MB on cdh129144:8042
2019-04-09 09:58:19.535 [pool-1-thread-4] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - TaskExecutor container_1554366508934_0084_01_000016 will be started with container size 2048 MB, JVM heap size 1920 MB, new generation size 480 MB, JVM direct memory limit 128 MB on cdh129136:8042
2019-04-09 09:58:19.536 [pool-1-thread-5] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Adding keytab hdfs://cdh129130:8020/user/hive/.flink/application_1554366508934_0084/hive.keytab to the AM container local resource bucket
2019-04-09 09:58:19.536 [pool-1-thread-4] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Adding keytab hdfs://cdh129130:8020/user/hive/.flink/application_1554366508934_0084/hive.keytab to the AM container local resource bucket
2019-04-09 09:58:19.565 [pool-1-thread-4] INFO  org.apache.flink.yarn.Utils  - Use the beforehand copied resource hdfs://cdh129130:8020/user/hive/.flink/application_1554366508934_0084/taskmanager-conf.yaml (the corresponding local path: file:/home/yarn/nm/usercache/hive/appcache/application_1554366508934_0084/container_1554366508934_0084_01_000001/taskmanager-conf.yaml). Visibility: APPLICATION.
2019-04-09 09:58:19.565 [pool-1-thread-5] INFO  org.apache.flink.yarn.Utils  - Use the beforehand copied resource hdfs://cdh129130:8020/user/hive/.flink/application_1554366508934_0084/taskmanager-conf.yaml (the corresponding local path: file:/home/yarn/nm/usercache/hive/appcache/application_1554366508934_0084/container_1554366508934_0084_01_000001/taskmanager-conf.yaml). Visibility: APPLICATION.
2019-04-09 09:58:19.565 [pool-1-thread-2] INFO  org.apache.flink.yarn.Utils  - Use the beforehand copied resource hdfs://cdh129130:8020/user/hive/.flink/application_1554366508934_0084/taskmanager-conf.yaml (the corresponding local path: file:/home/yarn/nm/usercache/hive/appcache/application_1554366508934_0084/container_1554366508934_0084_01_000001/taskmanager-conf.yaml). Visibility: APPLICATION.
2019-04-09 09:58:19.665 [pool-1-thread-2] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Creating container launch context for TaskManagers
2019-04-09 09:58:19.665 [pool-1-thread-4] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Creating container launch context for TaskManagers
2019-04-09 09:58:19.665 [pool-1-thread-5] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Creating container launch context for TaskManagers
2019-04-09 09:58:19.665 [pool-1-thread-2] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Starting TaskManagers
2019-04-09 09:58:19.665 [pool-1-thread-4] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Starting TaskManagers
2019-04-09 09:58:19.665 [pool-1-thread-5] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Starting TaskManagers
2019-04-09 09:58:25.037 [AMRM Callback Handler Thread] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Container container_1554366508934_0084_01_000014 finished with exit code 31
2019-04-09 09:58:25.038 [AMRM Callback Handler Thread] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Requesting new container with resources <memory:2048, vCores:6>. Number pending requests 1.
2019-04-09 09:58:25.038 [AMRM Callback Handler Thread] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Container container_1554366508934_0084_01_000015 finished with exit code 31
2019-04-09 09:58:25.039 [AMRM Callback Handler Thread] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Requesting new container with resources <memory:2048, vCores:6>. Number pending requests 2.
2019-04-09 09:58:25.039 [AMRM Callback Handler Thread] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Container container_1554366508934_0084_01_000016 finished with exit code 31
2019-04-09 09:58:25.039 [AMRM Callback Handler Thread] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Requesting new container with resources <memory:2048, vCores:6>. Number pending requests 3.
2019-04-09 09:58:30.547 [AMRM Callback Handler Thread] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Received new container: container_1554366508934_0084_01_000017 - Remaining pending container requests: 2
2019-04-09 09:58:30.548 [AMRM Callback Handler Thread] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Received new container: container_1554366508934_0084_01_000018 - Remaining pending container requests: 1
2019-04-09 09:58:30.548 [AMRM Callback Handler Thread] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Received new container: container_1554366508934_0084_01_000019 - Remaining pending container requests: 0
2019-04-09 09:58:30.548 [pool-1-thread-6] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - TaskExecutor container_1554366508934_0084_01_000017 will be started with container size 2048 MB, JVM heap size 1920 MB, new generation size 480 MB, JVM direct memory limit 128 MB on cdh129139:8042
2019-04-09 09:58:30.549 [pool-1-thread-3] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - TaskExecutor container_1554366508934_0084_01_000019 will be started with container size 2048 MB, JVM heap size 1920 MB, new generation size 480 MB, JVM direct memory limit 128 MB on cdh129136:8042
2019-04-09 09:58:30.548 [pool-1-thread-7] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - TaskExecutor container_1554366508934_0084_01_000018 will be started with container size 2048 MB, JVM heap size 1920 MB, new generation size 480 MB, JVM direct memory limit 128 MB on cdh129144:8042
2019-04-09 09:58:30.549 [pool-1-thread-6] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Adding keytab hdfs://cdh129130:8020/user/hive/.flink/application_1554366508934_0084/hive.keytab to the AM container local resource bucket
2019-04-09 09:58:30.550 [pool-1-thread-3] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Adding keytab hdfs://cdh129130:8020/user/hive/.flink/application_1554366508934_0084/hive.keytab to the AM container local resource bucket
2019-04-09 09:58:30.550 [pool-1-thread-7] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Adding keytab hdfs://cdh129130:8020/user/hive/.flink/application_1554366508934_0084/hive.keytab to the AM container local resource bucket
2019-04-09 09:58:30.575 [pool-1-thread-3] INFO  org.apache.flink.yarn.Utils  - Use the beforehand copied resource hdfs://cdh129130:8020/user/hive/.flink/application_1554366508934_0084/taskmanager-conf.yaml (the corresponding local path: file:/home/yarn/nm/usercache/hive/appcache/application_1554366508934_0084/container_1554366508934_0084_01_000001/taskmanager-conf.yaml). Visibility: APPLICATION.
2019-04-09 09:58:30.576 [pool-1-thread-6] INFO  org.apache.flink.yarn.Utils  - Use the beforehand copied resource hdfs://cdh129130:8020/user/hive/.flink/application_1554366508934_0084/taskmanager-conf.yaml (the corresponding local path: file:/home/yarn/nm/usercache/hive/appcache/application_1554366508934_0084/container_1554366508934_0084_01_000001/taskmanager-conf.yaml). Visibility: APPLICATION.
2019-04-09 09:58:30.576 [pool-1-thread-7] INFO  org.apache.flink.yarn.Utils  - Use the beforehand copied resource hdfs://cdh129130:8020/user/hive/.flink/application_1554366508934_0084/taskmanager-conf.yaml (the corresponding local path: file:/home/yarn/nm/usercache/hive/appcache/application_1554366508934_0084/container_1554366508934_0084_01_000001/taskmanager-conf.yaml). Visibility: APPLICATION.
2019-04-09 09:58:30.661 [pool-1-thread-6] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Creating container launch context for TaskManagers
2019-04-09 09:58:30.661 [pool-1-thread-7] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Creating container launch context for TaskManagers
2019-04-09 09:58:30.661 [pool-1-thread-3] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Creating container launch context for TaskManagers
2019-04-09 09:58:30.662 [pool-1-thread-6] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Starting TaskManagers
2019-04-09 09:58:30.662 [pool-1-thread-7] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Starting TaskManagers
2019-04-09 09:58:30.662 [pool-1-thread-3] INFO  org.apache.flink.yarn.YarnSessionResourceManager  - Starting TaskManagers

________________________________
seanlwj@live.com

发件人: 苏 欣<ma...@live.com>
发送时间: 2019-04-09 10:36
收件人: user-zh<ma...@flink.apache.org>
主题: 回复: 答复: blink提交yarn卡在一直重复分配container


________________________________
seanlwj@live.com

发件人: 苏 欣<ma...@live.com>
发送时间: 2019-04-09 10:30
收件人: user-zh@flink.apache.org<ma...@flink.apache.org>
主题: 答复: blink提交yarn卡在一直重复分配container
不好意思,已补充yarn的日志文件。

出现问题的原因我已经找到了,在配置flink-conf.yaml中的下面三项后,会出现分配不了资源的问题
security.kerberos.login.use-ticket-cache: false
security.kerberos.login.keytab: /home/hive.keytab
security.kerberos.login.principal: hive/cdh129135@MYCDH
如果在客户机使用kinit命令后再提交,yarn资源可以正常分配。
现在我有几个问题请教大佬们:

1、 提交作业到带有kerberos认证的yarn,除了kinit方式之外还有其他方式吗,为什么读配置文件中的票据会出现code 31?

2、  taskmanager.cpu.core与slot数量在yarn上面他们是相等的吗?有没有一个core分配多个slot的情况?


发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用

发件人: Zili Chen<ma...@gmail.com>
发送时间: 2019年4月8日 19:29
收件人: user-zh@flink.apache.org<ma...@flink.apache.org>
主题: Re: blink提交yarn卡在一直重复分配container

你好,apache 的邮件列表不支持内嵌图片,请以附件或链接方式引用。

Best,
tison.


苏 欣 <se...@live.com> 于2019年4月8日周一 上午10:17写道:

> 我以per-job方式提交了一个作业到yarn上面,发现会出现不断重复分配container的现象。
>
> 现象为从yarn的web ui上看一瞬间tm的container分配成功了,但是立刻变为只剩一个jm的container,接着会继续分配tm的
> container。不断的重复这个过程直到作业调度不到资源而失败。
>
> 我查了一下exit code没找到31代表是什么意思,有没有大佬帮忙分析下,非常感谢!
>
>
>
> 发送自 Windows 10 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>应用
>
>
>