You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Andrew Lee <al...@hotmail.com> on 2014/03/25 21:47:05 UTC

Spark 0.9.1 - How to run bin/spark-class with my own hadoop jar files?

Hi All,
I'm getting the following error when I execute start-master.sh which also invokes spark-class at the end.








Failed to find Spark assembly in /root/spark/assembly/target/scala-2.10/
You need to build Spark with 'sbt/sbt assembly' before running this program.
After digging into the code, I see the CLASSPATH is hardcoded with "spark-assembly.*hadoop.*.jar".In bin/spark-class :
if [ ! -f "$FWDIR/RELEASE" ]; then  # Exit if the user hasn't compiled Spark  num_jars=$(ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/ | grep "spark-assembly.*hadoop.*.jar" | wc -l)  jars_list=$(ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/ | grep "spark-assembly.*hadoop.*.jar")  if [ "$num_jars" -eq "0" ]; then    echo "Failed to find Spark assembly in $FWDIR/assembly/target/scala-$SCALA_VERSION/" >&2    echo "You need to build Spark with 'sbt/sbt assembly' before running this program." >&2    exit 1  fi  if [ "$num_jars" -gt "1" ]; then    echo "Found multiple Spark assembly jars in $FWDIR/assembly/target/scala-$SCALA_VERSION:" >&2    echo "$jars_list"    echo "Please remove all but one jar."    exit 1  fi






















fi
Is there any reason why this is only grabbing spark-assembly.*hadoop.*.jar ? I am trying to run Spark that links to my own version of Hadoop under /opt/hadoop23/, and I use 'sbt/sbt clean package' to build the package without the Hadoop jar. What is the correct way to link to my own Hadoop jar?

 		 	   		   		 	   		  

RE: Spark 0.9.1 - How to run bin/spark-class with my own hadoop jar files?

Posted by Andrew Lee <al...@hotmail.com>.
Hi Paul,
I got it sorted out.
The problem is that the JARs are built into the assembly JARs when you run
sbt/sbt clean assembly
What I did is:sbt/sbt clean package
This will only give you the small JARs. The next steps is to update the CLASSPATH in the bin/compute-classpath.sh script manually, appending all the JARs.
With :
sbt/sbt assembly
We can't introduce our own Hadoop patch since it will always pull from Maven repo, unless we hijack the repository path, or do a 'mvn install' locally. This is more of a hack I think.


Date: Tue, 25 Mar 2014 15:23:08 -0700
Subject: Re: Spark 0.9.1 - How to run bin/spark-class with my own hadoop jar files?
From: paulmschooss@gmail.com
To: user@spark.apache.org

Andrew, 
I ran into the same problem and eventually settled on just running the jars directly with java. Since we use sbt to build our jars we had all the dependancies builtin to the jar it self so need for random class paths. 


On Tue, Mar 25, 2014 at 1:47 PM, Andrew Lee <al...@hotmail.com> wrote:




Hi All,
I'm getting the following error when I execute start-master.sh which also invokes spark-class at the end.








Failed to find Spark assembly in /root/spark/assembly/target/scala-2.10/

You need to build Spark with 'sbt/sbt assembly' before running this program.


After digging into the code, I see the CLASSPATH is hardcoded with "spark-assembly.*hadoop.*.jar".

In bin/spark-class :


if [ ! -f "$FWDIR/RELEASE" ]; then
  # Exit if the user hasn't compiled Spark
  num_jars=$(ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/ | grep "spark-assembly.*hadoop.*.jar" | wc -l)

  jars_list=$(ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/ | grep "spark-assembly.*hadoop.*.jar")
  if [ "$num_jars" -eq "0" ]; then

    echo "Failed to find Spark assembly in $FWDIR/assembly/target/scala-$SCALA_VERSION/" >&2
    echo "You need to build Spark with 'sbt/sbt assembly' before running this program." >&2

    exit 1
  fi
  if [ "$num_jars" -gt "1" ]; then
    echo "Found multiple Spark assembly jars in $FWDIR/assembly/target/scala-$SCALA_VERSION:" >&2

    echo "$jars_list"
    echo "Please remove all but one jar."
    exit 1
  fi

























fi


Is there any reason why this is only grabbing spark-assembly.*hadoop.*.jar ? I am trying to run Spark that links to my own version of Hadoop under /opt/hadoop23/, 

and I use 'sbt/sbt clean package' to build the package without the Hadoop jar. What is the correct way to link to my own Hadoop jar?




 		 	   		   		 	   		  


 		 	   		  

Re: Spark 0.9.1 - How to run bin/spark-class with my own hadoop jar files?

Posted by Paul Schooss <pa...@gmail.com>.
Andrew,

I ran into the same problem and eventually settled on just running the jars
directly with java. Since we use sbt to build our jars we had all the
dependancies builtin to the jar it self so need for random class paths.


On Tue, Mar 25, 2014 at 1:47 PM, Andrew Lee <al...@hotmail.com> wrote:

> Hi All,
>
> I'm getting the following error when I execute start-master.sh which also
> invokes spark-class at the end.
>
> Failed to find Spark assembly in /root/spark/assembly/target/scala-2.10/
>
> You need to build Spark with 'sbt/sbt assembly' before running this
> program.
>
>
> After digging into the code, I see the CLASSPATH is hardcoded with "
> spark-assembly.*hadoop.*.jar".
>
> In bin/spark-class :
>
>
> if [ ! -f "$FWDIR/RELEASE" ]; then
>
>   # Exit if the user hasn't compiled Spark
>
> *  num_jars=$(ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/ | grep
> "spark-assembly.*hadoop.*.jar" | wc -l)*
>
> *  jars_list=$(ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/ | grep
> "spark-assembly.*hadoop.*.jar")*
>
>   if [ "$num_jars" -eq "0" ]; then
>
>     echo "Failed to find Spark assembly in
> $FWDIR/assembly/target/scala-$SCALA_VERSION/" >&2
>
>     echo "You need to build Spark with 'sbt/sbt assembly' before running
> this program." >&2
>
>     exit 1
>
>   fi
>
>   if [ "$num_jars" -gt "1" ]; then
>
>     echo "Found multiple Spark assembly jars in
> $FWDIR/assembly/target/scala-$SCALA_VERSION:" >&2
>
>     echo "$jars_list"
>
>     echo "Please remove all but one jar."
>
>     exit 1
>
>   fi
>
> fi
>
>
> Is there any reason why this is only grabbing spark-assembly.**hadoop*.*.jar
> ? I am trying to run Spark that links to my own version of Hadoop under
> /opt/hadoop23/,
>
> and I use 'sbt/sbt clean package' to build the package without the Hadoop
> jar. What is the correct way to link to my own Hadoop jar?
>
>
>
>