You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2021/10/26 08:10:29 UTC
[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #6605: yarn applications: application_1634958933716_0113 , query status failed
github-actions[bot] commented on issue #6605:
URL: https://github.com/apache/dolphinscheduler/issues/6605#issuecomment-951671505
### Search before asking
- [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues.
### What happened
### operate
when I run a shell for testing mapreduce in ds (fee image below),ds web log shows : yarn status get failed.
shell content :
`hadoop jar /opt/app/hadoop-2.9.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar pi 10 10`
![360截图17520712172535](https://user-images.githubusercontent.com/20452924/138829616-ce15ed7b-6ad1-46b3-be4d-c45025294689.png)
### ds web log
[INFO] 2021-10-26 10:34:28.745 - [taskAppId=TASK-1-6-89]:[115] - create dir success /exec/process/1/1/6/89
[INFO] 2021-10-26 10:34:28.754 - [taskAppId=TASK-1-6-89]:[88] - shell task params {"rawScript":"hadoop jar /opt/app/hadoop-2.9.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar pi 10 10","localParams":[],"resourceList":[]}
[INFO] 2021-10-26 10:34:28.758 - [taskAppId=TASK-1-6-89]:[154] - raw script : hadoop jar /opt/app/hadoop-2.9.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar pi 10 10
[INFO] 2021-10-26 10:34:28.759 - [taskAppId=TASK-1-6-89]:[155] - task execute path : /exec/process/1/1/6/89
[INFO] 2021-10-26 10:34:28.760 - [taskAppId=TASK-1-6-89]:[87] - tenantCode user:root, task dir:1_6_89
[INFO] 2021-10-26 10:34:28.760 - [taskAppId=TASK-1-6-89]:[92] - create command file:/exec/process/1/1/6/89/1_6_89.command
[INFO] 2021-10-26 10:34:28.760 - [taskAppId=TASK-1-6-89]:[111] - command : #!/bin/sh
BASEDIR=$(cd `dirname $0`; pwd)
cd $BASEDIR
source /opt/app/dolphinscheduler/conf/env/dolphinscheduler_env.sh
/exec/process/1/1/6/89/1_6_89_node.sh
[INFO] 2021-10-26 10:34:28.764 - [taskAppId=TASK-1-6-89]:[330] - task run command:
sudo -u root sh /exec/process/1/1/6/89/1_6_89.command
[INFO] 2021-10-26 10:34:28.773 - [taskAppId=TASK-1-6-89]:[211] - process start, process id is: 19627
[INFO] 2021-10-26 10:34:29.774 - [taskAppId=TASK-1-6-89]:[138] - -> SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/app/hadoop-2.9.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/app/tez/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Number of Maps = 10
Samples per Map = 10
[INFO] 2021-10-26 10:34:31.775 - [taskAppId=TASK-1-6-89]:[138] - -> Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
21/10/26 10:34:31 INFO client.RMProxy: Connecting to ResourceManager at hadoop47/192.168.80.47:8032
[INFO] 2021-10-26 10:34:32.776 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:32 INFO input.FileInputFormat: Total input files to process : 10
21/10/26 10:34:32 INFO mapreduce.JobSubmitter: number of splits:10
21/10/26 10:34:32 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
21/10/26 10:34:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1634958933716_0113
21/10/26 10:34:32 INFO impl.YarnClientImpl: Submitted application application_1634958933716_0113
21/10/26 10:34:32 INFO mapreduce.Job: The url to track the job: http://hadoop47:8088/proxy/application_1634958933716_0113/
21/10/26 10:34:32 INFO mapreduce.Job: Running job: job_1634958933716_0113
[INFO] 2021-10-26 10:34:40.785 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:39 INFO mapreduce.Job: Job job_1634958933716_0113 running in uber mode : false
21/10/26 10:34:39 INFO mapreduce.Job: map 0% reduce 0%
[INFO] 2021-10-26 10:34:56.789 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:56 INFO mapreduce.Job: map 30% reduce 0%
[INFO] 2021-10-26 10:34:57.790 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:57 INFO mapreduce.Job: map 100% reduce 0%
[INFO] 2021-10-26 10:35:02.715 - [taskAppId=TASK-1-6-89]:[445] - find app id: application_1634958933716_0113
[INFO] 2021-10-26 10:35:02.715 - [taskAppId=TASK-1-6-89]:[402] - check yarn application status, appId:application_1634958933716_0113
[ERROR] 2021-10-26 10:35:02.720 - [taskAppId=TASK-1-6-89]:[418] - yarn applications: application_1634958933716_0113 , query status failed, exception:{}
java.lang.NullPointerException: null
at org.apache.dolphinscheduler.common.utils.HadoopUtils.getApplicationStatus(HadoopUtils.java:423)
at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.isSuccessOfYarnState(AbstractCommandExecutor.java:404)
at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.run(AbstractCommandExecutor.java:230)
at org.apache.dolphinscheduler.server.worker.task.shell.ShellTask.handle(ShellTask.java:101)
at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[INFO] 2021-10-26 10:35:02.720 - [taskAppId=TASK-1-6-89]:[238] - process has exited, execute path:/exec/process/1/1/6/89, processId:19627 ,exitStatusCode:-1 ,processWaitForStatus:true ,processExitValue:0
[INFO] 2021-10-26 10:35:02.791 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:35:02 INFO mapreduce.Job: map 100% reduce 100%
21/10/26 10:35:02 INFO mapreduce.Job: Job job_1634958933716_0113 completed successfully
21/10/26 10:35:02 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=226
FILE: Number of bytes written=2205654
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2630
HDFS: Number of bytes written=215
HDFS: Number of read operations=43
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=10
Launched reduce tasks=1
Data-local map tasks=10
Total time spent by all maps in occupied slots (ms)=149819
Total time spent by all reduces in occupied slots (ms)=3113
Total time spent by all map tasks (ms)=149819
Total time spent by all reduce tasks (ms)=3113
Total vcore-milliseconds taken by all map tasks=149819
Total vcore-milliseconds taken by all reduce tasks=3113
Total megabyte-milliseconds taken by all map tasks=153414656
Total megabyte-milliseconds taken by all reduce tasks=3187712
Map-Reduce Framework
Map input records=10
Map output records=20
Map output bytes=180
Map output materialized bytes=280
Input split bytes=1450
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=280
Reduce input records=20
Reduce output records=0
Spilled Records=40
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=6825
CPU time spent (ms)=4980
Physical memory (bytes) snapshot=3529900032
Virtual memory (bytes) snapshot=22377988096
Total committed heap usage (bytes)=2413297664
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1180
File Output Format Counters
Bytes Written=97
Job Finished in 30.695 seconds
Estimated value of Pi is 3.20000000000000000000
### worker debug log
[DEBUG] 2021-10-26 10:34:56.708 org.apache.zookeeper.ClientCnxn:[846] - Reading reply sessionid:0x20015bfe8a400c9, packet:: clientPath:/dolphinscheduler/nodes/worker/default/192.168.80.49:1234 serverPath:/dolphinscheduler/nodes/worker/default/192.168.80.49:1234 finished:false header:: 2933,4 replyHeader:: 2933,17180717039,0 request:: '/dolphinscheduler/nodes/worker/default/192.168.80.49:1234,T response:: #302e332c302e39312c302e35392c312e33372c382e302c302e332c323032312d31302d32362030393a32373a30362c323032312d31302d32362031303a33343a35362c302c34303937,s{17180707701,17180717039,1635211626683,1635215696700,407,0,0,144139102061854920,73,0,17180707701}
[DEBUG] 2021-10-26 10:34:56.708 org.apache.dolphinscheduler.service.zk.ZookeeperCachedOperator:[62] - zookeeperListener:org.apache.dolphinscheduler.server.master.registry.ServerNodeManager$WorkerGroupNodeListener triggered
[DEBUG] 2021-10-26 10:34:56.709 org.apache.curator.framework.recipes.cache.TreeCache:[396] - processResult: CuratorEventImpl{type=GET_DATA, resultCode=0, path='/dolphinscheduler/nodes/worker/default/192.168.80.49:1234', name='null', children=null, context=null, stat=17180707701,17180717039,1635211626683,1635215696700,407,0,0,144139102061854920,73,0,17180707701
, data=[48, 46, 51, 44, 48, 46, 57, 49, 44, 48, 46, 53, 57, 44, 49, 46, 51, 55, 44, 56, 46, 48, 44, 48, 46, 51, 44, 50, 48, 50, 49, 45, 49, 48, 45, 50, 54, 32, 48, 57, 58, 50, 55, 58, 48, 54, 44, 50, 48, 50, 49, 45, 49, 48, 45, 50, 54, 32, 49, 48, 58, 51, 52, 58, 53, 54, 44, 48, 44, 52, 48, 57, 55], watchedEvent=null, aclList=null, opResults=null}
[DEBUG] 2021-10-26 10:34:56.709 org.apache.curator.framework.recipes.cache.TreeCache:[857] - publishEvent: TreeCacheEvent{type=NODE_UPDATED, data=ChildData{path='/dolphinscheduler/nodes/worker/default/192.168.80.49:1234', stat=17180707701,17180717039,1635211626683,1635215696700,407,0,0,144139102061854920,73,0,17180707701
, data=[48, 46, 51, 44, 48, 46, 57, 49, 44, 48, 46, 53, 57, 44, 49, 46, 51, 55, 44, 56, 46, 48, 44, 48, 46, 51, 44, 50, 48, 50, 49, 45, 49, 48, 45, 50, 54, 32, 48, 57, 58, 50, 55, 58, 48, 54, 44, 50, 48, 50, 49, 45, 49, 48, 45, 50, 54, 32, 49, 48, 58, 51, 52, 58, 53, 54, 44, 48, 44, 52, 48, 57, 55]}}
[INFO] 2021-10-26 10:34:56.789 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:56 INFO mapreduce.Job: map 30% reduce 0%
[INFO] 2021-10-26 10:34:57.790 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:34:57 INFO mapreduce.Job: map 100% reduce 0%
[DEBUG] 2021-10-26 10:34:58.313 org.apache.zookeeper.ClientCnxn:[745] - Got ping response for sessionid: 0x30015c0a38d009d after 0ms
[INFO] 2021-10-26 10:35:02.715 - [taskAppId=TASK-1-6-89]:[445] - find app id: application_1634958933716_0113
[INFO] 2021-10-26 10:35:02.715 - [taskAppId=TASK-1-6-89]:[402] - check yarn application status, appId:application_1634958933716_0113
[DEBUG] 2021-10-26 10:35:02.715 org.apache.dolphinscheduler.common.utils.HadoopUtils:[211] - yarn application url:http://hadoop47:%s/ws/v1/cluster/apps/%s, applicationId:application_1634958933716_0113
[ERROR] 2021-10-26 10:35:02.720 org.apache.dolphinscheduler.common.utils.HttpUtils:[73] - Connect to hadoop47:80 [hadoop47/192.168.80.47] failed: Connection refused (Connection refused)
org.apache.http.conn.HttpHostConnectException: Connect to hadoop47:80 [hadoop47/192.168.80.47] failed: Connection refused (Connection refused)
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:151)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
at org.apache.dolphinscheduler.common.utils.HttpUtils.get(HttpUtils.java:60)
at org.apache.dolphinscheduler.common.utils.HadoopUtils.getApplicationStatus(HadoopUtils.java:420)
at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.isSuccessOfYarnState(AbstractCommandExecutor.java:404)
at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.run(AbstractCommandExecutor.java:230)
at org.apache.dolphinscheduler.server.worker.task.shell.ShellTask.handle(ShellTask.java:101)
at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:476)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:218)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:394)
at java.net.Socket.connect(Socket.java:606)
at org.apache.http.conn.socket.PlainConnectionSocketFactory.connectSocket(PlainConnectionSocketFactory.java:74)
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:134)
... 20 common frames omitted
[ERROR] 2021-10-26 10:35:02.720 - [taskAppId=TASK-1-6-89]:[418] - yarn applications: application_1634958933716_0113 , query status failed, exception:{}
java.lang.NullPointerException: null
at org.apache.dolphinscheduler.common.utils.HadoopUtils.getApplicationStatus(HadoopUtils.java:423)
at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.isSuccessOfYarnState(AbstractCommandExecutor.java:404)
at org.apache.dolphinscheduler.server.worker.task.AbstractCommandExecutor.run(AbstractCommandExecutor.java:230)
at org.apache.dolphinscheduler.server.worker.task.shell.ShellTask.handle(ShellTask.java:101)
at org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread.run(TaskExecuteThread.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[INFO] 2021-10-26 10:35:02.720 - [taskAppId=TASK-1-6-89]:[238] - process has exited, execute path:/exec/process/1/1/6/89, processId:19627 ,exitStatusCode:-1 ,processWaitForStatus:true ,processExitValue:0
[INFO] 2021-10-26 10:35:02.720 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[147] - task instance id : 89,task final status : FAILURE
[INFO] 2021-10-26 10:35:02.721 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[185] - develop mode is: false
[INFO] 2021-10-26 10:35:02.721 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[203] - exec local path: /exec/process/1/1/6/89 cleared.
[INFO] 2021-10-26 10:35:02.791 - [taskAppId=TASK-1-6-89]:[138] - -> 21/10/26 10:35:02 INFO mapreduce.Job: map 100% reduce 100%
21/10/26 10:35:02 INFO mapreduce.Job: Job job_1634958933716_0113 completed successfully
21/10/26 10:35:02 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=226
FILE: Number of bytes written=2205654
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2630
HDFS: Number of bytes written=215
HDFS: Number of read operations=43
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=10
Launched reduce tasks=1
Data-local map tasks=10
Total time spent by all maps in occupied slots (ms)=149819
Total time spent by all reduces in occupied slots (ms)=3113
Total time spent by all map tasks (ms)=149819
Total time spent by all reduce tasks (ms)=3113
Total vcore-milliseconds taken by all map tasks=149819
Total vcore-milliseconds taken by all reduce tasks=3113
Total megabyte-milliseconds taken by all map tasks=153414656
Total megabyte-milliseconds taken by all reduce tasks=3187712
Map-Reduce Framework
Map input records=10
Map output records=20
Map output bytes=180
Map output materialized bytes=280
Input split bytes=1450
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=280
Reduce input records=20
Reduce output records=0
Spilled Records=40
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=6825
CPU time spent (ms)=4980
Physical memory (bytes) snapshot=3529900032
Virtual memory (bytes) snapshot=22377988096
Total committed heap usage (bytes)=2413297664
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1180
File Output Format Counters
Bytes Written=97
Job Finished in 30.695 seconds
Estimated value of Pi is 3.20000000000000000000
### What you expected to happen
Yarn application_1634958933716_0113 status can always be get;
![360截图16380508205068](https://user-images.githubusercontent.com/20452924/138833771-460534f6-a830-44d1-86aa-5970217eb712.png)
### How to reproduce
Server: KunPeng
OS centos7
DS release:1.3.9
Hadoop version :2.9.2
Yarn Ha: False
conf/common.properties
# resourcemanager port, the default value is 8088 if not specified
resource.manager.httpaddress.port=
# if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty
yarn.resourcemanager.ha.rm.ids=
# if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname
yarn.application.status.address=http://hadoop47:%s/ws/v1/cluster/apps/%s
### Anything else
some times fail ; high probability of this error
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org