You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Eugene Morozov <ev...@gmail.com> on 2016/07/20 14:01:43 UTC

Snappy initialization issue, spark assembly jar missing snappy classes?

Greetings!

We're reading input files with newApiHadoopFile that is configured with
multiline split. Everything's fine, besides
https://issues.apache.org/jira/browse/MAPREDUCE-6549. It looks like the
issue is fixed, but within hadoop 2.7.2. Which means we have to download
spark without hadoop and provide custom version of it. Now we use
spark-1.6.1.

It mostly fine, there is doc how to configure, spark started, but when I
use it it gives me nasty exception about snappy cannot be initialized. I
tried few things - update snappy version inside hadoop, package snappy into
my own application jar, but it works only when I literally copy
snappy-java.jar classes into spark-assembly-1.6.1-hadoop2.2.0.jar. It seems
working for now, but I dislike this approach, because I simply cannot know
what else won't work tomorrow.
It looks like I can just turn off snappy, but I want it, I believe it makes
sense to compress data shuffled and stored around.

Could you suggest any way besides copying these classes inside assembled
spark jar file?


The snappy exception
Job aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most
recent failure: Lost task 1.3 in stage 1.0 (TID 69,
icomputer.petersburg.epam.com): java.io.IOException:
java.lang.reflect.InvocationTargetException
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1222)
at
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:165)
at
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:88)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedConstructorAccessor9.newInstance(Unknown Source)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at
org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:72)
at
org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:65)
at org.apache.spark.broadcast.TorrentBroadcast.org
$apache$spark$broadcast$TorrentBroadcast$$setConf(TorrentBroadcast.scala:73)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:167)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1219)
... 11 more
Caused by: java.lang.IllegalArgumentException:
java.lang.NoClassDefFoundError: Could not initialize class
org.xerial.snappy.Snappy
at
org.apache.spark.io.SnappyCompressionCodec$.liftedTree1$1(CompressionCodec.scala:171)
at
org.apache.spark.io.SnappyCompressionCodec$.org$apache$spark$io$SnappyCompressionCodec$$version$lzycompute(CompressionCodec.scala:168)
at
org.apache.spark.io.SnappyCompressionCodec$.org$apache$spark$io$SnappyCompressionCodec$$version(CompressionCodec.scala:168)
at
org.apache.spark.io.SnappyCompressionCodec.<init>(CompressionCodec.scala:152)
... 19 more
Caused by: java.lang.NoClassDefFoundError: Could not initialize class
org.xerial.snappy.Snappy
at
org.apache.spark.io.SnappyCompressionCodec$.liftedTree1$1(CompressionCodec.scala:169)
... 22 more
--
Be well!
Jean Morozov