You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Guillermo Cabrera <gu...@gmail.com> on 2014/01/31 21:39:39 UTC

Connecting to remote Spark cluster using Java+Maven

Hi:

I have a 2 node Spark cluster that I built with Hadoop 2.2.0 compatibility,
I also have HDFS on both machines. Everything works great, I can read files
from HDFS through spark shell. My question is on the requirement if I want
to connect to this cluster from a machine outside my cluster. So, if I am
doing a Java+Maven...

a) Can I use the make-distribution.sh bundled in spark directory and give
the produced JAR to someone else for him/her to include as a dependency in
Maven, and then have them point to the Spark Master in my cluster when they
want to run their work?
b) Do I need to connect in different way, maybe w/o JAR?

Has anyone successfully used Java+Maven to connect to remote cluster? I get
the following error, it might just be an issue with how I do things in
Maven as this class is included in the JAR file produced when running
make-distribution.sh

java.lang.NoClassDefFoundError: org/apache/spark/api/java/function/Function

I tried looking at
http://spark.incubator.apache.org/docs/latest/quick-start.html but if I
follow the sample pom.xml listed, I get following error:

java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:297)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.VerifyError: class
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AppendRequestProto
overrides final method
getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;

Thanks,
Gui