You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Manohar753 <ma...@happiestminds.com> on 2017/04/27 14:31:59 UTC
javaRDD to collectasMap throuwa ava.lang.NegativeArraySizeException
HI All,
getting the below Exception while converting my rdd to Map below is the
code.and my data size is hardly 200MD snappy file and the code looks like
this
@SuppressWarnings("unchecked")
public Tuple2<Map<String, byte[]>, String> getMatchData(String location,
String key) {
ParquetInputFormat.setReadSupportClass(this.getJob(),
(Class<AvroReadSupport<GenericRecord>>) (Class<?>)
AvroReadSupport.class);
JavaPairRDD<Void, GenericRecord> avroRDD =
this.getSparkContext().newAPIHadoopFile(location,
(Class<ParquetInputFormat<GenericRecord>>) (Class<?>)
ParquetInputFormat.class, Void.class,
GenericRecord.class, this.getJob().getConfiguration());
JavaPairRDD<String, GenericRecord> kv = avroRDD.mapToPair(new
MapAvroToKV(key, new SpelExpressionParser()));
Schema schema = kv.first()._2().getSchema();
JavaPairRDD<String, byte[]> bytesKV = kv.mapToPair(new
AvroToBytesFunction());
Map<String, byte[]> map = bytesKV.collectAsMap();
Map<String, byte[]> hashmap = new HashMap<String, byte[]>(map);
return new Tuple2<>(hashmap, schema.toString());
}
please help me out if any thoughts and thanks in advance
04/27 03:13:18 INFO YarnClusterScheduler: Cancelling stage 11
04/27 03:13:18 INFO DAGScheduler: ResultStage 11 (collectAsMap at
DoubleClickSORJob.java:281) failed in 0.734 s due to Job aborted due to
stage failure: Task 0 in stage 11.0 failed 4 times, most recent failure:
Lost task 0.3 in stage 11.0 (TID 15, ip-172-31-50-58.ec2.internal, executor
1): java.io.IOException: java.lang.NegativeArraySizeException
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1276)
at
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:206)
at
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:81)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NegativeArraySizeException
at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:325)
at
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:60)
at
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:43)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
at
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:244)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$10.apply(TorrentBroadcast.scala:286)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1303)
at
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:287)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:221)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1269)
... 11 more
Driver stacktrace:
04/27 03:13:18 INFO DAGScheduler: Job 11 failed: collectAsMap at
DoubleClickSORJob.java:281, took 3.651447 s
04/27 03:13:18 ERROR ApplicationMaster: User class threw exception:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 11.0 failed 4 times, most recent failure: Lost task 0.3 in stage 11.0
(TID 15, ip-172-31-50-58.ec2.internal, executor 1): java.io.IOException:
java.lang.NegativeArraySizeException
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1276)
at
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:206)
at
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
at
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
at
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:81)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NegativeArraySizeException
at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:325)
at
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:60)
at
com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:43)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)
at
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:244)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$10.apply(TorrentBroadcast.scala:286)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1303)
at
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:287)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:221)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1269)
... 11 more
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/javaRDD-to-collectasMap-throuwa-ava-lang-NegativeArraySizeException-tp28631.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org