You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Praveen Kumar Ramachandran <n....@gmail.com> on 2020/06/22 11:45:36 UTC
Reg - Why Apache Hadoop need to be Installed separately for Running Apache Spark…?
I'm learning Apache Spark, where I'm trying to run a basic Spark Program
written in Java. I've installed Apache Spark
*(spark-2.4.3-bin-without-hadoop)* downloaded from https://spark.apache.org/
.
I've created a maven project in eclipse and added the following dependency :
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.4.3</version>
</dependency>
After building the project, I've tried to run the program by setting
sparkMaster=local through spark config and now I've Encountered with the
following Error :
java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set.
After referring to some sites, I've installed hadoop-2.7.7 and added
"HADOOP_HOME" to my .bash_profile.
And I'm able to execute my Spark Program!!
*Now I need to know where and how Hadoop is necessary for Spark??*
I've posted the same in stackoverflow long back, but still can't get a
response.
https://stackoverflow.com/questions/57435163/why-apache-hadoop-need-to-be-installed-for-running-apache-spark
Regards,
Praveen Kumar Ramachandran