You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuechen Chen (JIRA)" <ji...@apache.org> on 2017/05/05 06:12:04 UTC
[jira] [Created] (SPARK-20608) Standby namenodes should be allowed
to included in yarn.spark.access.namenodes to support HDFS HA
Yuechen Chen created SPARK-20608:
------------------------------------
Summary: Standby namenodes should be allowed to included in yarn.spark.access.namenodes to support HDFS HA
Key: SPARK-20608
URL: https://issues.apache.org/jira/browse/SPARK-20608
Project: Spark
Issue Type: Bug
Components: Spark Submit, YARN
Affects Versions: 2.1.0, 2.0.1
Reporter: Yuechen Chen
If one Spark Application need to access remote namenodes, ${yarn.spark.access.namenodes} should be only be configged in spark-submit scripts, and Spark Client(On Yarn) would fetch HDFS credential periodically.
If one hadoop cluster is configured by HA, there would be one active namenode and at least one standby namenode.
However, if ${yarn.spark.access.namenodes} includes both active and standby namenodes, Spark Application will be failed for the reason that the standby namenode would not access by Spark for org.apache.hadoop.ipc.StandbyException.
I think it won't cause any bad effect to config standby namenodes in ${yarn.spark.access.namenodes}, and my Spark Application can be able to sustain the failover of Hadoop namenode.
HA Examples:
spark-submit script: yarn.spark.access.namenodes=hdfs://namenode01,hdfs://namenode02
Spark Application:
dataframe.write.parquet(getActiveNameNode(...) + hdfsPath)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org