You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by maxdml <ma...@gmail.com> on 2015/07/10 15:21:18 UTC

Re: Issues when combining Spark and a third party java library

I'm using hadoop 2.5.2 with spark 1.4.0 and I can also see in my logs:

15/07/09 06:39:02 DEBUG HadoopRDD: SplitLocationInfo and other new Hadoop
classes are unavailable. Using the older Hadoop location info code.
java.lang.ClassNotFoundException:
org.apache.hadoop.mapred.InputSplitWithLocationInfo
  at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:264)
  at
org.apache.spark.rdd.HadoopRDD$SplitInfoReflections.<init>(HadoopRDD.scala:386)
  at org.apache.spark.rdd.HadoopRDD$.liftedTree1$1(HadoopRDD.scala:396)
  at org.apache.spark.rdd.HadoopRDD$.<init>(HadoopRDD.scala:395)
  at org.apache.spark.rdd.HadoopRDD$.<clinit>(HadoopRDD.scala)
  at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:165)
  at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
  at scala.Option.getOrElse(Option.scala:120)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
  at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
  at scala.Option.getOrElse(Option.scala:120)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
  at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
  at scala.Option.getOrElse(Option.scala:120)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
  at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
  at scala.Option.getOrElse(Option.scala:120)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
  at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:65)
  at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:290)
  at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:290)
  at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
  at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
  at
org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:289)
  at WordCount$.main(WordCount.scala:13)
  at WordCount.main(WordCount.scala)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:497)
  at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
  at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
  at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


The application I launch through spark-submit can access data on hdfs tho,
and I launch the script with HADOOP_HOME being set.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Issues-when-combining-Spark-and-a-third-party-java-library-tp21367p23765.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Issues when combining Spark and a third party java library

Posted by Max Demoulin <ma...@gmail.com>.

Yes,

Thank you.

--
Henri Maxime Demoulin

2015-07-12 2:53 GMT-04:00 Akhil Das <ak...@sigmoidanalytics.com>:

> Did you try setting the HADOOP_CONF_DIR?
>
> Thanks
> Best Regards
>
> On Sat, Jul 11, 2015 at 3:17 AM, maxdml <ma...@gmail.com> wrote:
>
>> Also, it's worth noting that I'm using the prebuilt version for hadoop 2.4
>> and higher from the official website.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Issues-when-combining-Spark-and-a-third-party-java-library-tp21367p23770.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Re: Issues when combining Spark and a third party java library

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

Did you try setting the HADOOP_CONF_DIR?

Thanks
Best Regards

On Sat, Jul 11, 2015 at 3:17 AM, maxdml <ma...@gmail.com> wrote:

> Also, it's worth noting that I'm using the prebuilt version for hadoop 2.4
> and higher from the official website.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Issues-when-combining-Spark-and-a-third-party-java-library-tp21367p23770.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: Issues when combining Spark and a third party java library

Posted by maxdml <ma...@gmail.com>.

Also, it's worth noting that I'm using the prebuilt version for hadoop 2.4
and higher from the official website.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Issues-when-combining-Spark-and-a-third-party-java-library-tp21367p23770.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org