You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by hermansc <he...@gmail.com> on 2015/07/30 10:48:19 UTC

Running Spark on user-provided Hadoop installation

Hi.

I want to run Spark, and more specifically the "Pre-build with user-provided
Hadoop" version from the downloads page, but I can't find any documentation
on how to connect the two components together (namely Spark and Hadoop).

I've had some success in settting SPARK_CLASSPATH to my hadoop distribution
lib/ directory, containing jar files such as hadoop-core, hadoop-common etc.

However, there seems to be many native libraries included in the assembly
jar for Spark versions pre-built for Hadoop distributions (I'm specifically
missing the libsnappy.so files) that are not by default included in
distributions such as Cloudera Hadoop.

Have anyone here actually tried to run Spark without Hadoop included in the
assembly jar and/or have any more resources where I can read about the
proper way of connecting them?

As an aside, the spark-assembly jar in the Spark version pre-built for
user-provided Hadoop distributions is named
spark-assembly-1.4.0-hadoop2.2.0.jar, which doesn't make sense - it should
be called spark-assembly-1.4.0-without-hadoop.jar :)

-- 
Herman



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-on-user-provided-Hadoop-installation-tp24076.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Running Spark on user-provided Hadoop installation

Posted by gauravsehgal <ga...@gmail.com>.

Refer: http://spark.apache.org/docs/latest/hadoop-provided.html

Specifically if you want to refer s3a paths. Please edit spark-env.sh and
add following lines at end:
SPARK_DIST_CLASSPATH=$(/path/to/hadoop/hadoop-2.7.1/bin/hadoop classpath)
export
SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/path/to/hadoop/hadoop-2.7.1/share/hadoop/tools/lib/*"



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-on-user-provided-Hadoop-installation-tp24076p24310.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Running Spark on user-provided Hadoop installation

Posted by Ted Yu <yu...@gmail.com>.

Herman:
For "Pre-built with user-provided Hadoop", spark-1.4.1-bin-hadoop2.6.tgz,
e.g., uses hadoop-2.6 profile which defines versions of projects Spark
depends on.

Hadoop cluster is used to provide storage (hdfs) and resource management
(YARN).
For the latter, please see:
https://spark.apache.org/docs/latest/running-on-yarn.html

Cheers

On Thu, Jul 30, 2015 at 1:48 AM, hermansc <he...@gmail.com> wrote:

> Hi.
>
> I want to run Spark, and more specifically the "Pre-build with
> user-provided
> Hadoop" version from the downloads page, but I can't find any documentation
> on how to connect the two components together (namely Spark and Hadoop).
>
> I've had some success in settting SPARK_CLASSPATH to my hadoop distribution
> lib/ directory, containing jar files such as hadoop-core, hadoop-common
> etc.
>
> However, there seems to be many native libraries included in the assembly
> jar for Spark versions pre-built for Hadoop distributions (I'm specifically
> missing the libsnappy.so files) that are not by default included in
> distributions such as Cloudera Hadoop.
>
> Have anyone here actually tried to run Spark without Hadoop included in the
> assembly jar and/or have any more resources where I can read about the
> proper way of connecting them?
>
> As an aside, the spark-assembly jar in the Spark version pre-built for
> user-provided Hadoop distributions is named
> spark-assembly-1.4.0-hadoop2.2.0.jar, which doesn't make sense - it should
> be called spark-assembly-1.4.0-without-hadoop.jar :)
>
> --
> Herman
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Running-Spark-on-user-provided-Hadoop-installation-tp24076.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>