You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jon Chase <jo...@gmail.com> on 2014/11/25 15:06:02 UTC

Spark cluster with Java 8 using ./spark-ec2

I'm trying to use the spark-ec2 command to launch a Spark cluster that runs
Java 8, but so far I haven't been able to get the Spark processes to use
the right JVM at start up.

Here's the command I use for launching the cluster.  Note I'm using the
user-data feature to install Java 8:

./spark-ec2 -k spark -i ~/.ssh/spark.pem \
          -t m3.large -s 1  \
          --user-data=java8.sh launch spark


After the cluster is running, I can SSH in and see that the default Java
version is indeed 8:

> ssh root@...

$ echo $JAVA_HOME
/usr/java/default

$ java -version
java version "1.8.0"
Java(TM) SE Runtime Environment (build 1.8.0-b132)
Java HotSpot(TM) 64-Bit Server VM (build 25.0-b70, mixed mode)


It seems that the Spark processes are still using Java 7.  I've tried
running sbin/stop-all.sh and start-all.sh from master, but that doesn't
seem to help.

What magic incantation am I missing?






java8.sh user data script:

#!/bin/bash

# Check java version
JAVA_VER=$(java -version 2>&1 | sed 's/java version
"\(.*\)\.\(.*\)\..*"/\1\2/; 1q')

if [ "$JAVA_VER" -lt 18 ]
then
    # Download jdk 8
    echo "Downloading and installing jdk 8"
    wget --no-cookies --no-check-certificate --header "Cookie:
gpw_e24=http%3A%2F%2Fwww.oracle.com%2F;
oraclelicense=accept-securebackup-cookie" "
http://download.oracle.com/otn-pub/java/jdk/8-b132/jdk-8-linux-x64.rpm"

    # Silent install
    yum -y install jdk-8-linux-x64.rpm

    # Figure out how many versions of Java we currently have
    NR_OF_OPTIONS=$(echo 0 | alternatives --config java 2>/dev/null | grep
'There ' | awk '{print $3}' | tail -1)

    echo "Found $NR_OF_OPTIONS existing versions of java. Adding new
version."

    # Make the new java version available via /etc/alternatives
    alternatives --install /usr/bin/java java /usr/java/default/bin/java 1

    # Make java 8 the default
    echo $(($NR_OF_OPTIONS + 1)) | alternatives --config java

    # Set some variables
    export JAVA_HOME=/usr/java/default/bin/java
    export JRE_HOME=/usr/java/default/jre
    export PATH=$PATH:/usr/java/default/bin
fi

# Check java version again
JAVA_VER=$(java -version 2>&1 | sed 's/java version
"\(.*\)\.\(.*\)\..*"/\1\2/; 1q')

echo "export JAVA_HOME=/usr/java/default" >> /root/.bash_profile


. ~/.bash_profile

echo "Java version is $JAVA_VER"
echo "JAVA_HOME: $JAVA_HOME"
echo "JRE_HOME: $JRE_HOME"
echo "PATH: $PATH"










Here's the stacktrace from stdout from the spark-submit command:



14/11/25 14:01:11 INFO scheduler.TaskSetManager: Lost task 0.3 in stage 1.0
(TID 7) on executor ip-xx-xx-xxx-xx.eu-west-1.compute.internal:
java.lang.UnsupportedClassVersionError (foo/spark/Main : Unsupported
major.minor version 52.0) [duplicate 3]
14/11/25 14:01:11 ERROR scheduler.TaskSetManager: Task 0 in stage 1.0
failed 4 times; aborting job
14/11/25 14:01:11 INFO scheduler.TaskSchedulerImpl: Cancelling stage 1
14/11/25 14:01:11 INFO scheduler.TaskSchedulerImpl: Stage 1 was cancelled
14/11/25 14:01:11 INFO scheduler.DAGScheduler: Failed to run
saveAsHadoopFile at Main.java:146
Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure:
Lost task 0.3 in stage 1.0 (TID 7,
ip-xx-xx-xxx-xx.eu-west-1.compute.internal):
java.lang.UnsupportedClassVersionError: foo/spark/Main : Unsupported
major.minor version 52.0
        java.lang.ClassLoader.defineClass1(Native Method)
        java.lang.ClassLoader.defineClass(ClassLoader.java:800)

java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
        java.net.URLClassLoader.access$100(URLClassLoader.java:71)
        java.net.URLClassLoader$1.run(URLClassLoader.java:361)
        java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        java.security.AccessController.doPrivileged(Native Method)
        java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        java.lang.Class.forName0(Native Method)
        java.lang.Class.forName(Class.java:274)

org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:59)

java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
        java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
        java.io.ObjectInputStream.readClass(ObjectInputStream.java:1483)
        java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1333)

java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)

java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)

java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)

java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)

java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)

java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)

java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)

java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)

java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)

java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)

java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)

java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        scala.collection.immutable.$colon$colon.readObject(List.scala:362)
        sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        java.lang.reflect.Method.invoke(Method.java:606)

java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)

java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)

java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)

java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)

java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)

java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)

java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)

java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)

java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)

org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)

org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)

org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:60)

org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        org.apache.spark.scheduler.Task.run(Task.scala:54)

org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)