You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Penny Espinoza <pe...@societyconsulting.com> on 2014/09/06 01:33:32 UTC

prepending jars to the driver class path for spark-submit on YARN

Hey - I’m struggling with some dependency issues with org.apache.httpcomponents httpcore and httpclient when using spark-submit with YARN running Spark 1.0.2 on a Hadoop 2.2 cluster.  I’ve seen several posts about this issue, but no resolution.

The error message is this:


Caused by: java.lang.NoSuchMethodError: org.apache.http.impl.conn.DefaultClientConnectionOperator.<init>(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
        at org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
        at org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:114)
        at org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:99)
        at org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:85)
        at org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:93)
        at com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:26)
        at com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:96)
        at com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:155)
        at com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:118)
        at com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:102)
        at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:332)
        at com.oncue.rna.realtime.streaming.config.package$.transferManager(package.scala:76)
        at com.oncue.rna.realtime.streaming.models.S3SchemaRegistry.<init>(SchemaRegistry.scala:27)
        at com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry$lzycompute(SchemaRegistry.scala:46)
        at com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry(SchemaRegistry.scala:44)
        at com.oncue.rna.realtime.streaming.coders.KafkaAvroDecoder.<init>(KafkaAvroDecoder.scala:20)
        ... 17 more

The apache httpcomponents libraries include the method above as of version 4.2.  The Spark 1.0.2 binaries seem to include version 4.1.

I can get this to work in my driver program by adding exclusions to force use of 4.1, but then I get the error in tasks even when using the —jars option of the spark-submit command.  How can I get both the driver program and the individual tasks in my spark-streaming job to use the same version of this library so my job will run all the way through?

thanks
p

Re: prepending jars to the driver class path for spark-submit on YARN

Posted by Penny Espinoza <pe...@societyconsulting.com>.
I finally seem to have gotten past this issue.  Here’s what I did:

  *   rather than using the binary distribution, I built Spark from scratch to eliminate the 4.1 version of org.apache.httpcomponents from the assembly
     *   git clone https://github.com/apache/spark.git
     *   cd spark
     *   git checkout v1.0.2
     *   edited pom.xml to remove the modules sql/hive and examples
     *   export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m”
     *   mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package
  *   rebuilt my own assembly, eliminating all exclusions I had previously included to force use of org.apache.httpcomponents 4.1

On Sep 8, 2014, at 12:03 PM, Penny Espinoza <pe...@societyconsulting.com>> wrote:

I have tried using the spark.files.userClassPathFirst option (which, incidentally, is documented now, but marked as experimental), but it just causes different errors.  I am using spark-streaming-kafka.  If I mark spark-core and spark-streaming as provided and also exclude them from the spark-streaming-kafka dependency, I get this error:

14/09/08 18:34:23 WARN scheduler.TaskSetManager: Loss was due to java.lang.ClassCastException
java.lang.ClassCastException: cannot assign instance of com.oncue.rna.realtime.streaming.spark.BaseKafkaExtractorJob$$anonfun$getEventsStream$1 to fie
ld org.apache.spark.rdd.MappedRDD.f of type scala.Function1 in instance of org.apache.spark.rdd.MappedRDD
       at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083)
       at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1996)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
       at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:606)
       at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
       at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:606)
       at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
       at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:606)
       at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
       at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:606)
       at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
       at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:606)
       at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
       at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:606)
       at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
       at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:606)
       at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
       at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
       at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:61)
       at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:141)
       at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
       at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
       at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:165)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       at java.lang.Thread.run(Thread.java:744)


If I mark spark-core and spark-streaming as provided, but do not exclude those from the spark-streaming-kafka dependency, I get this error:

14/09/08 18:10:26 WARN scheduler.TaskSetManager: Loss was due to java.lang.ClassCastException
java.lang.ClassCastException: cannot assign instance of org.apache.spark.storage.StorageLevel to field org.apache.spark.streaming.receiver.Receiver.storageLevel of type org.apache.spark.storage.StorageLevel in instance of org.apache.spark.streaming.kafka.KafkaReceiver
       at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083)
       at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1996)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
       at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:500)
       at org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:74)
       at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       at java.lang.reflect.Method.invoke(Method.java:606)
       at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
       at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
       at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:147)
       at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
       at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
       at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
       at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
       at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
       at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:165)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       at java.lang.Thread.run(Thread.java:744)

If I do not mark spark-core and spark-streaming as provided, and also omit the exclusions, I get the same error I get when they are marked as provided and excluded.





________________________________________
From: Xiangrui Meng <me...@gmail.com>>
Sent: Sunday, September 07, 2014 11:40 PM
To: Victor Tso-Guillen
Cc: Penny Espinoza; Spark
Subject: Re: prepending jars to the driver class path for spark-submit on YARN

There is an undocumented configuration to put users jars in front of
spark jar. But I'm not very certain that it works as expected (and
this is why it is undocumented). Please try turning on
spark.yarn.user.classpath.first . -Xiangrui

On Sat, Sep 6, 2014 at 5:13 PM, Victor Tso-Guillen <vt...@paxata.com>> wrote:
I ran into the same issue. What I did was use maven shade plugin to shade my
version of httpcomponents libraries into another package.


On Fri, Sep 5, 2014 at 4:33 PM, Penny Espinoza
<pe...@societyconsulting.com>> wrote:

Hey - I’m struggling with some dependency issues with
org.apache.httpcomponents httpcore and httpclient when using spark-submit
with YARN running Spark 1.0.2 on a Hadoop 2.2 cluster.  I’ve seen several
posts about this issue, but no resolution.

The error message is this:


Caused by: java.lang.NoSuchMethodError:
org.apache.http.impl.conn.DefaultClientConnectionOperator.<init>(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
       at
org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
       at
org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:114)
       at
org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:99)
       at
org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:85)
       at
org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:93)
       at
com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:26)
       at
com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:96)
       at
com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:155)
       at
com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:118)
       at
com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:102)
       at
com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:332)
       at
com.oncue.rna.realtime.streaming.config.package$.transferManager(package.scala:76)
       at
com.oncue.rna.realtime.streaming.models.S3SchemaRegistry.<init>(SchemaRegistry.scala:27)
       at
com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry$lzycompute(SchemaRegistry.scala:46)
       at
com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry(SchemaRegistry.scala:44)
       at
com.oncue.rna.realtime.streaming.coders.KafkaAvroDecoder.<init>(KafkaAvroDecoder.scala:20)
       ... 17 more

The apache httpcomponents libraries include the method above as of version
4.2.  The Spark 1.0.2 binaries seem to include version 4.1.

I can get this to work in my driver program by adding exclusions to force
use of 4.1, but then I get the error in tasks even when using the —jars
option of the spark-submit command.  How can I get both the driver program
and the individual tasks in my spark-streaming job to use the same version
of this library so my job will run all the way through?

thanks
p


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org<ma...@spark.apache.org>



RE: prepending jars to the driver class path for spark-submit on YARN

Posted by Penny Espinoza <pe...@societyconsulting.com>.
I have tried using the spark.files.userClassPathFirst option (which, incidentally, is documented now, but marked as experimental), but it just causes different errors.  I am using spark-streaming-kafka.  If I mark spark-core and spark-streaming as provided and also exclude them from the spark-streaming-kafka dependency, I get this error:

14/09/08 18:34:23 WARN scheduler.TaskSetManager: Loss was due to java.lang.ClassCastException
java.lang.ClassCastException: cannot assign instance of com.oncue.rna.realtime.streaming.spark.BaseKafkaExtractorJob$$anonfun$getEventsStream$1 to fie
ld org.apache.spark.rdd.MappedRDD.f of type scala.Function1 in instance of org.apache.spark.rdd.MappedRDD
        at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083)
        at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1996)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
        at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:61)
        at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:141)
        at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:165)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)


If I mark spark-core and spark-streaming as provided, but do not exclude those from the spark-streaming-kafka dependency, I get this error:

14/09/08 18:10:26 WARN scheduler.TaskSetManager: Loss was due to java.lang.ClassCastException
java.lang.ClassCastException: cannot assign instance of org.apache.spark.storage.StorageLevel to field org.apache.spark.streaming.receiver.Receiver.storageLevel of type org.apache.spark.storage.StorageLevel in instance of org.apache.spark.streaming.kafka.KafkaReceiver
        at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083)
        at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1996)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:500)
        at org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:74)
        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:147)
        at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:165)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

If I do not mark spark-core and spark-streaming as provided, and also omit the exclusions, I get the same error I get when they are marked as provided and excluded.





________________________________________
From: Xiangrui Meng <me...@gmail.com>
Sent: Sunday, September 07, 2014 11:40 PM
To: Victor Tso-Guillen
Cc: Penny Espinoza; Spark
Subject: Re: prepending jars to the driver class path for spark-submit on YARN

There is an undocumented configuration to put users jars in front of
spark jar. But I'm not very certain that it works as expected (and
this is why it is undocumented). Please try turning on
spark.yarn.user.classpath.first . -Xiangrui

On Sat, Sep 6, 2014 at 5:13 PM, Victor Tso-Guillen <vt...@paxata.com> wrote:
> I ran into the same issue. What I did was use maven shade plugin to shade my
> version of httpcomponents libraries into another package.
>
>
> On Fri, Sep 5, 2014 at 4:33 PM, Penny Espinoza
> <pe...@societyconsulting.com> wrote:
>>
>> Hey - I’m struggling with some dependency issues with
>> org.apache.httpcomponents httpcore and httpclient when using spark-submit
>> with YARN running Spark 1.0.2 on a Hadoop 2.2 cluster.  I’ve seen several
>> posts about this issue, but no resolution.
>>
>> The error message is this:
>>
>>
>> Caused by: java.lang.NoSuchMethodError:
>> org.apache.http.impl.conn.DefaultClientConnectionOperator.<init>(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
>>         at
>> org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
>>         at
>> org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:114)
>>         at
>> org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:99)
>>         at
>> org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:85)
>>         at
>> org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:93)
>>         at
>> com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:26)
>>         at
>> com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:96)
>>         at
>> com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:155)
>>         at
>> com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:118)
>>         at
>> com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:102)
>>         at
>> com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:332)
>>         at
>> com.oncue.rna.realtime.streaming.config.package$.transferManager(package.scala:76)
>>         at
>> com.oncue.rna.realtime.streaming.models.S3SchemaRegistry.<init>(SchemaRegistry.scala:27)
>>         at
>> com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry$lzycompute(SchemaRegistry.scala:46)
>>         at
>> com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry(SchemaRegistry.scala:44)
>>         at
>> com.oncue.rna.realtime.streaming.coders.KafkaAvroDecoder.<init>(KafkaAvroDecoder.scala:20)
>>         ... 17 more
>>
>> The apache httpcomponents libraries include the method above as of version
>> 4.2.  The Spark 1.0.2 binaries seem to include version 4.1.
>>
>> I can get this to work in my driver program by adding exclusions to force
>> use of 4.1, but then I get the error in tasks even when using the —jars
>> option of the spark-submit command.  How can I get both the driver program
>> and the individual tasks in my spark-streaming job to use the same version
>> of this library so my job will run all the way through?
>>
>> thanks
>> p
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: prepending jars to the driver class path for spark-submit on YARN

Posted by Xiangrui Meng <me...@gmail.com>.
There is an undocumented configuration to put users jars in front of
spark jar. But I'm not very certain that it works as expected (and
this is why it is undocumented). Please try turning on
spark.yarn.user.classpath.first . -Xiangrui

On Sat, Sep 6, 2014 at 5:13 PM, Victor Tso-Guillen <vt...@paxata.com> wrote:
> I ran into the same issue. What I did was use maven shade plugin to shade my
> version of httpcomponents libraries into another package.
>
>
> On Fri, Sep 5, 2014 at 4:33 PM, Penny Espinoza
> <pe...@societyconsulting.com> wrote:
>>
>> Hey - I’m struggling with some dependency issues with
>> org.apache.httpcomponents httpcore and httpclient when using spark-submit
>> with YARN running Spark 1.0.2 on a Hadoop 2.2 cluster.  I’ve seen several
>> posts about this issue, but no resolution.
>>
>> The error message is this:
>>
>>
>> Caused by: java.lang.NoSuchMethodError:
>> org.apache.http.impl.conn.DefaultClientConnectionOperator.<init>(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
>>         at
>> org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
>>         at
>> org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:114)
>>         at
>> org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:99)
>>         at
>> org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:85)
>>         at
>> org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:93)
>>         at
>> com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:26)
>>         at
>> com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:96)
>>         at
>> com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:155)
>>         at
>> com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:118)
>>         at
>> com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:102)
>>         at
>> com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:332)
>>         at
>> com.oncue.rna.realtime.streaming.config.package$.transferManager(package.scala:76)
>>         at
>> com.oncue.rna.realtime.streaming.models.S3SchemaRegistry.<init>(SchemaRegistry.scala:27)
>>         at
>> com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry$lzycompute(SchemaRegistry.scala:46)
>>         at
>> com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry(SchemaRegistry.scala:44)
>>         at
>> com.oncue.rna.realtime.streaming.coders.KafkaAvroDecoder.<init>(KafkaAvroDecoder.scala:20)
>>         ... 17 more
>>
>> The apache httpcomponents libraries include the method above as of version
>> 4.2.  The Spark 1.0.2 binaries seem to include version 4.1.
>>
>> I can get this to work in my driver program by adding exclusions to force
>> use of 4.1, but then I get the error in tasks even when using the —jars
>> option of the spark-submit command.  How can I get both the driver program
>> and the individual tasks in my spark-streaming job to use the same version
>> of this library so my job will run all the way through?
>>
>> thanks
>> p
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: prepending jars to the driver class path for spark-submit on YARN

Posted by Xiangrui Meng <me...@gmail.com>.
When you submit the job to yarn with spark-submit, set --conf
spark.yarn.user.classpath.first=true .

On Mon, Sep 8, 2014 at 10:46 AM, Penny Espinoza
<pe...@societyconsulting.com> wrote:
> I don't understand what you mean.  Can you be more specific?
>
>
> ________________________________
> From: Victor Tso-Guillen <vt...@paxata.com>
> Sent: Saturday, September 06, 2014 5:13 PM
> To: Penny Espinoza
> Cc: Spark
> Subject: Re: prepending jars to the driver class path for spark-submit on
> YARN
>
> I ran into the same issue. What I did was use maven shade plugin to shade my
> version of httpcomponents libraries into another package.
>
>
> On Fri, Sep 5, 2014 at 4:33 PM, Penny Espinoza
> <pe...@societyconsulting.com> wrote:
>>
>> Hey - I’m struggling with some dependency issues with
>> org.apache.httpcomponents httpcore and httpclient when using spark-submit
>> with YARN running Spark 1.0.2 on a Hadoop 2.2 cluster.  I’ve seen several
>> posts about this issue, but no resolution.
>>
>> The error message is this:
>>
>>
>> Caused by: java.lang.NoSuchMethodError:
>> org.apache.http.impl.conn.DefaultClientConnectionOperator.<init>(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
>>         at
>> org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
>>         at
>> org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:114)
>>         at
>> org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:99)
>>         at
>> org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:85)
>>         at
>> org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:93)
>>         at
>> com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:26)
>>         at
>> com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:96)
>>         at
>> com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:155)
>>         at
>> com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:118)
>>         at
>> com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:102)
>>         at
>> com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:332)
>>         at
>> com.oncue.rna.realtime.streaming.config.package$.transferManager(package.scala:76)
>>         at
>> com.oncue.rna.realtime.streaming.models.S3SchemaRegistry.<init>(SchemaRegistry.scala:27)
>>         at
>> com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry$lzycompute(SchemaRegistry.scala:46)
>>         at
>> com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry(SchemaRegistry.scala:44)
>>         at
>> com.oncue.rna.realtime.streaming.coders.KafkaAvroDecoder.<init>(KafkaAvroDecoder.scala:20)
>>         ... 17 more
>>
>> The apache httpcomponents libraries include the method above as of version
>> 4.2.  The Spark 1.0.2 binaries seem to include version 4.1.
>>
>> I can get this to work in my driver program by adding exclusions to force
>> use of 4.1, but then I get the error in tasks even when using the —jars
>> option of the spark-submit command.  How can I get both the driver program
>> and the individual tasks in my spark-streaming job to use the same version
>> of this library so my job will run all the way through?
>>
>> thanks
>> p
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


RE: prepending jars to the driver class path for spark-submit on YARN

Posted by Penny Espinoza <pe...@societyconsulting.com>.
I don't understand what you mean.  Can you be more specific?


________________________________
From: Victor Tso-Guillen <vt...@paxata.com>
Sent: Saturday, September 06, 2014 5:13 PM
To: Penny Espinoza
Cc: Spark
Subject: Re: prepending jars to the driver class path for spark-submit on YARN

I ran into the same issue. What I did was use maven shade plugin to shade my version of httpcomponents libraries into another package.


On Fri, Sep 5, 2014 at 4:33 PM, Penny Espinoza <pe...@societyconsulting.com>> wrote:
Hey - I'm struggling with some dependency issues with org.apache.httpcomponents httpcore and httpclient when using spark-submit with YARN running Spark 1.0.2 on a Hadoop 2.2 cluster.  I've seen several posts about this issue, but no resolution.

The error message is this:


Caused by: java.lang.NoSuchMethodError: org.apache.http.impl.conn.DefaultClientConnectionOperator.<init>(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
        at org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
        at org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:114)
        at org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:99)
        at org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:85)
        at org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:93)
        at com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:26)
        at com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:96)
        at com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:155)
        at com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:118)
        at com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:102)
        at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:332)
        at com.oncue.rna.realtime.streaming.config.package$.transferManager(package.scala:76)
        at com.oncue.rna.realtime.streaming.models.S3SchemaRegistry.<init>(SchemaRegistry.scala:27)
        at com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry$lzycompute(SchemaRegistry.scala:46)
        at com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry(SchemaRegistry.scala:44)
        at com.oncue.rna.realtime.streaming.coders.KafkaAvroDecoder.<init>(KafkaAvroDecoder.scala:20)
        ... 17 more

The apache httpcomponents libraries include the method above as of version 4.2.  The Spark 1.0.2 binaries seem to include version 4.1.

I can get this to work in my driver program by adding exclusions to force use of 4.1, but then I get the error in tasks even when using the -jars option of the spark-submit command.  How can I get both the driver program and the individual tasks in my spark-streaming job to use the same version of this library so my job will run all the way through?

thanks
p


RE: prepending jars to the driver class path for spark-submit on YARN

Posted by Penny Espinoza <pe...@societyconsulting.com>.
?VIctor - Not sure what you mean.  Can you provide more detail about what you did?

________________________________
From: Victor Tso-Guillen <vt...@paxata.com>
Sent: Saturday, September 06, 2014 5:13 PM
To: Penny Espinoza
Cc: Spark
Subject: Re: prepending jars to the driver class path for spark-submit on YARN

I ran into the same issue. What I did was use maven shade plugin to shade my version of httpcomponents libraries into another package.


On Fri, Sep 5, 2014 at 4:33 PM, Penny Espinoza <pe...@societyconsulting.com>> wrote:
Hey - I'm struggling with some dependency issues with org.apache.httpcomponents httpcore and httpclient when using spark-submit with YARN running Spark 1.0.2 on a Hadoop 2.2 cluster.  I've seen several posts about this issue, but no resolution.

The error message is this:


Caused by: java.lang.NoSuchMethodError: org.apache.http.impl.conn.DefaultClientConnectionOperator.<init>(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
        at org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
        at org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:114)
        at org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:99)
        at org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:85)
        at org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:93)
        at com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:26)
        at com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:96)
        at com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:155)
        at com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:118)
        at com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:102)
        at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:332)
        at com.oncue.rna.realtime.streaming.config.package$.transferManager(package.scala:76)
        at com.oncue.rna.realtime.streaming.models.S3SchemaRegistry.<init>(SchemaRegistry.scala:27)
        at com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry$lzycompute(SchemaRegistry.scala:46)
        at com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry(SchemaRegistry.scala:44)
        at com.oncue.rna.realtime.streaming.coders.KafkaAvroDecoder.<init>(KafkaAvroDecoder.scala:20)
        ... 17 more

The apache httpcomponents libraries include the method above as of version 4.2.  The Spark 1.0.2 binaries seem to include version 4.1.

I can get this to work in my driver program by adding exclusions to force use of 4.1, but then I get the error in tasks even when using the -jars option of the spark-submit command.  How can I get both the driver program and the individual tasks in my spark-streaming job to use the same version of this library so my job will run all the way through?

thanks
p


Re: prepending jars to the driver class path for spark-submit on YARN

Posted by Victor Tso-Guillen <vt...@paxata.com>.
I ran into the same issue. What I did was use maven shade plugin to shade
my version of httpcomponents libraries into another package.


On Fri, Sep 5, 2014 at 4:33 PM, Penny Espinoza <
pespino@societyconsulting.com> wrote:

>  Hey - I’m struggling with some dependency issues with
> org.apache.httpcomponents httpcore and httpclient when using spark-submit
> with YARN running Spark 1.0.2 on a Hadoop 2.2 cluster.  I’ve seen several
> posts about this issue, but no resolution.
>
>  The error message is this:
>
>
>  Caused by: java.lang.NoSuchMethodError:
> org.apache.http.impl.conn.DefaultClientConnectionOperator.<init>(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
>         at
> org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
>         at
> org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:114)
>         at
> org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:99)
>         at
> org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:85)
>         at
> org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:93)
>         at
> com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:26)
>         at
> com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:96)
>         at
> com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:155)
>         at
> com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:118)
>         at
> com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:102)
>         at
> com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:332)
>         at
> com.oncue.rna.realtime.streaming.config.package$.transferManager(package.scala:76)
>         at
> com.oncue.rna.realtime.streaming.models.S3SchemaRegistry.<init>(SchemaRegistry.scala:27)
>         at
> com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry$lzycompute(SchemaRegistry.scala:46)
>         at
> com.oncue.rna.realtime.streaming.models.S3SchemaRegistry$.schemaRegistry(SchemaRegistry.scala:44)
>         at
> com.oncue.rna.realtime.streaming.coders.KafkaAvroDecoder.<init>(KafkaAvroDecoder.scala:20)
>         ... 17 more
>
>   The apache httpcomponents libraries include the method above as of
> version 4.2.  The Spark 1.0.2 binaries seem to include version 4.1.
>
>  I can get this to work in my driver program by adding exclusions to
> force use of 4.1, but then I get the error in tasks even when using the
> —jars option of the spark-submit command.  How can I get both the driver
> program and the individual tasks in my spark-streaming job to use the same
> version of this library so my job will run all the way through?
>
>  thanks
> p
>