You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ambari.apache.org by "Akhil S Naik (JIRA)" <ji...@apache.org> on 2019/05/24 10:31:00 UTC
[jira] [Created] (AMBARI-25285) Ambari always copies and
overrwrites mapreduce.tar.gz to hdfs when WebHDFS is not enabled while
restarting HiveServer
Akhil S Naik created AMBARI-25285:
-------------------------------------
Summary: Ambari always copies and overrwrites mapreduce.tar.gz to hdfs when WebHDFS is not enabled while restarting HiveServer
Key: AMBARI-25285
URL: https://issues.apache.org/jira/browse/AMBARI-25285
Project: Ambari
Issue Type: Bug
Components: ambari-agent
Affects Versions: 2.7.3
Reporter: Akhil S Naik
Assignee: Akhil S Naik
Problem Statement :
When HiveServer2 is restarted, the startup python script will try to copy /usr/hdp/<version>/hadoop/mapreduce.tar.gz to /hdp/apps/<version>/mapreduce/mapreduce.tar.gz
Mapreduce jobs will fail with the error if the HiveServer2 restart happens and the YARN applications in ACCEPTED state go to RUNNING during the exact same time when the mapreduce.tar.gz file copy happens.
But when WebHDFS is enabled, this problem will never occur as the copying is skipped by Ambari and we can see the below line.
{code:java}
2019-05-23 10:11:18,371 - DFS file /hdp/apps/2.6.5.0-292/mapreduce/mapreduce.tar.gz is identical to /usr/hdp/2.6.5.0-292/hadoop/mapreduce.tar.gz, skipping the copying
{code}
When WebHDFS is disabled in the cluster, then the above line is not printed when starting HiveServer2.
But when WebHDFS is not started it will just overwrite the mapreduce.tar.gz without asking
analysis :
Looks issue with this part of code : https://github.com/apache/ambari/blob/4eee0f56d2fbfdfb0caace955339bc0c46a85a3c/contrib/fast-hdfs-resource/src/main/java/org/apache/ambari/fast_hdfs_resource/Runner.java#L131
https://github.com/apache/ambari/blob/4eee0f56d2fbfdfb0caace955339bc0c46a85a3c/contrib/fast-hdfs-resource/src/main/java/org/apache/ambari/fast_hdfs_resource/Resource.java#L236
we are just creating the file and overwriting it if exists.
We should do basic check if the file already exists of not before this copy operation and skip if file is same.
This will save the time of starting hive-server2 and also abnormal failure of mapreduce jobs.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)