You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by N B <nb...@gmail.com> on 2017/06/21 05:51:40 UTC

Spark 2.1.1 and Hadoop version 2.2 or 2.7?

I had downloaded the pre build package labeled "Spark 2.1.1 prebuilt with
Hadoop 2.7 or later" from the direct download link on spark.apache.org.

However, I am seeing compatibility errors running against a deployed HDFS
2.7.3. (See my earlier message about Flume DStream producing 0 records
after HDFS node restarted) I have been digging into this issue and have
started to suspect versions mismatch between Hadoop server and client. I
decided to look at Spark 2.1.1's pom.xml. It states hadoop,version as
2.2.0. There seems to be some mismtach here that I am not sure if that's
the root cause of the issues I have been seeing.

Can someone please confirm if the package mentioned above was indeed
compiled with Hadoop 2.7? Or should I fall back on an HDFS Server 2.2
instead?

Thanks
N B

Re: Spark 2.1.1 and Hadoop version 2.2 or 2.7?

Posted by yohann jardin <yo...@hotmail.com>.
https://spark.apache.org/docs/2.1.0/building-spark.html#specifying-the-hadoop-version

Version Hadoop v2.2.0 only is the default build version, but other versions can still be built. The package you downloaded is prebuilt for Hadoop 2.7 as said on the download page, don't worry.

Yohann Jardin

Le 6/21/2017 à 7:51 AM, N B a écrit :
I had downloaded the pre build package labeled "Spark 2.1.1 prebuilt with Hadoop 2.7 or later" from the direct download link on spark.apache.org<http://spark.apache.org>.

However, I am seeing compatibility errors running against a deployed HDFS 2.7.3. (See my earlier message about Flume DStream producing 0 records after HDFS node restarted) I have been digging into this issue and have started to suspect versions mismatch between Hadoop server and client. I decided to look at Spark 2.1.1's pom.xml. It states hadoop,version as 2.2.0. There seems to be some mismtach here that I am not sure if that's the root cause of the issues I have been seeing.

Can someone please confirm if the package mentioned above was indeed compiled with Hadoop 2.7? Or should I fall back on an HDFS Server 2.2 instead?

Thanks
N B