You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Praveen Kumar Ramachandran <n....@gmail.com> on 2020/06/22 11:45:36 UTC

Reg - Why Apache Hadoop need to be Installed separately for Running Apache Spark…?

I'm learning Apache Spark, where I'm trying to run a basic Spark Program
written in Java. I've installed Apache Spark
*(spark-2.4.3-bin-without-hadoop)* downloaded from https://spark.apache.org/
.

I've created a maven project in eclipse and added the following dependency :

    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.11</artifactId>
      <version>2.4.3</version>
    </dependency>

After building the project, I've tried to run the program by setting
sparkMaster=local through spark config and now I've Encountered with the
following Error :

    java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set.

After referring to some sites, I've installed hadoop-2.7.7 and added
"HADOOP_HOME" to my .bash_profile.

And I'm able to execute my Spark Program!!


*Now I need to know where and how Hadoop is necessary for Spark??*

I've posted the same in stackoverflow long back, but still can't get a
response.
https://stackoverflow.com/questions/57435163/why-apache-hadoop-need-to-be-installed-for-running-apache-spark

Regards,
Praveen Kumar Ramachandran