You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by cjdc <cr...@cern.ch> on 2014/12/05 10:52:34 UTC
Re: NullPointerException When Reading Avro Sequence Files
Hi all,
I've tried the above example on Gist, but it doesn't work (at least for me).
Did anyone get this:
14/12/05 10:44:40 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.IncompatibleClassChangeError: Found interface
org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
at
org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
at
org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:115)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:103)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
14/12/05 10:44:40 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception
in thread Thread[Executor task launch worker-0,5,main]
java.lang.IncompatibleClassChangeError: Found interface
org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
at
org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
at
org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:115)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:103)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
14/12/05 10:44:40 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times;
aborting job
Thanks
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-when-reading-Avro-Sequence-files-tp10201p20456.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: NullPointerException When Reading Avro Sequence Files
Posted by Simone Franzini <ca...@gmail.com>.
To me this looks like an internal error to the REPL. I am not sure what is
causing that.
Personally I never use the REPL, can you try typing up your program and
running it from an IDE or spark-submit and see if you still get the same
error?
Simone Franzini, PhD
http://www.linkedin.com/in/simonefranzini
On Mon, Dec 15, 2014 at 4:54 PM, Cristovao Jose Domingues Cordeiro <
cristovao.cordeiro@cern.ch> wrote:
>
> Sure, thanks:
> warning: there were 1 deprecation warning(s); re-run with -deprecation for
> details
> java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
> at org.apache.hadoop.mapreduce.Job.ensureState(Job.java:283)
> at org.apache.hadoop.mapreduce.Job.toString(Job.java:462)
> at
> scala.runtime.ScalaRunTime$.scala$runtime$ScalaRunTime$$inner$1(ScalaRunTime.scala:324)
> at scala.runtime.ScalaRunTime$.stringOf(ScalaRunTime.scala:329)
> at scala.runtime.ScalaRunTime$.replStringOf(ScalaRunTime.scala:337)
> at .<init>(<console>:10)
> at .<clinit>(<console>)
> at $print(<console>)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:846)
> at
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1119)
> at
> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:672)
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:703)
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:667)
> at
> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:819)
> at
> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:864)
> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:776)
> at
> org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:619)
> at
> org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:627)
> at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:632)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:959)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:907)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:907)
> at
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:907)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1002)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:331)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
>
>
> Could something you omitted in your snippet be chaining this exception?
>
> Cumprimentos / Best regards,
> Cristóvão José Domingues Cordeiro
> IT Department - 28/R-018
> CERN
> ------------------------------
> *From:* Simone Franzini [captainfranz@gmail.com]
> *Sent:* 15 December 2014 16:52
>
> *To:* Cristovao Jose Domingues Cordeiro
> *Subject:* Re: NullPointerException When Reading Avro Sequence Files
>
> Ok, I have no idea what that is. That appears to be an internal Spark
> exception. Maybe if you can post the entire stack trace it would give some
> more details to understand what is going on.
>
> Simone Franzini, PhD
>
> http://www.linkedin.com/in/simonefranzini
>
> On Mon, Dec 15, 2014 at 4:50 PM, Cristovao Jose Domingues Cordeiro <
> cristovao.cordeiro@cern.ch> wrote:
>>
>> Hi,
>>
>> thanks for that.
>> But yeah the 2nd line is an exception. jobread is not created.
>>
>> Cumprimentos / Best regards,
>> Cristóvão José Domingues Cordeiro
>> IT Department - 28/R-018
>> CERN
>> ------------------------------
>> *From:* Simone Franzini [captainfranz@gmail.com]
>> *Sent:* 15 December 2014 16:39
>>
>> *To:* Cristovao Jose Domingues Cordeiro
>> *Subject:* Re: NullPointerException When Reading Avro Sequence Files
>>
>> I did not mention the imports needed in my code. I think these are
>> all of them:
>>
>> import org.apache.hadoop.mapreduce.Job
>> import org.apache.hadoop.io.NullWritable
>> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat
>> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
>> import org.apache.hadoop.fs.{ FileSystem, Path }
>> import org.apache.avro.{ Schema, SchemaBuilder }
>> import org.apache.avro.SchemaBuilder._
>> import org.apache.avro.mapreduce.{ AvroJob, AvroKeyInputFormat,
>> AvroKeyOutputFormat }
>> import org.apache.avro.mapred.AvroKey
>>
>> However, what you mentioned is a warning that I think can be ignored. I
>> don't see any exception.
>>
>> Simone Franzini, PhD
>>
>> http://www.linkedin.com/in/simonefranzini
>>
>> On Mon, Dec 15, 2014 at 3:10 PM, Cristovao Jose Domingues Cordeiro <
>> cristovao.cordeiro@cern.ch> wrote:
>>>
>>> Hi Simone,
>>>
>>> I was finally able to get the chill package, but still, something
>>> unrelated which I can not run from your snippet is:
>>> val jobread = new Job()
>>>
>>> I get:
>>> warning: there were 1 deprecation warning(s); re-run with -deprecation
>>> for details
>>> java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
>>>
>>>
>>> Cumprimentos / Best regards,
>>> Cristóvão José Domingues Cordeiro
>>> IT Department - 28/R-018
>>> CERN
>>> ------------------------------
>>> *From:* Simone Franzini [captainfranz@gmail.com]
>>> *Sent:* 09 December 2014 17:06
>>>
>>> *To:* Cristovao Jose Domingues Cordeiro; user
>>> *Subject:* Re: NullPointerException When Reading Avro Sequence Files
>>>
>>> You can use this Maven dependency:
>>>
>>> <dependency>
>>> <groupId>com.twitter</groupId>
>>> <artifactId>chill-avro</artifactId>
>>> <version>0.4.0</version>
>>> </dependency>
>>>
>>> Simone Franzini, PhD
>>>
>>> http://www.linkedin.com/in/simonefranzini
>>>
>>> On Tue, Dec 9, 2014 at 9:53 AM, Cristovao Jose Domingues Cordeiro <
>>> cristovao.cordeiro@cern.ch> wrote:
>>>
>>>> Thanks for the reply!
>>>>
>>>> I've tried in fact your code. But I lack the twiter chill package and I
>>>> can not find it online. So I am now trying this
>>>> http://spark.apache.org/docs/latest/tuning.html#data-serialization .
>>>> But in case I can't do it, could you tell me where to get that Twiter
>>>> package you used?
>>>>
>>>> Thanks
>>>>
>>>> Cumprimentos / Best regards,
>>>> Cristóvão José Domingues Cordeiro
>>>> IT Department - 28/R-018
>>>> CERN
>>>> ------------------------------
>>>> *From:* Simone Franzini [captainfranz@gmail.com]
>>>> *Sent:* 09 December 2014 16:42
>>>> *To:* Cristovao Jose Domingues Cordeiro; user
>>>>
>>>> *Subject:* Re: NullPointerException When Reading Avro Sequence Files
>>>>
>>>> Hi Cristovao,
>>>>
>>>> I have seen a very similar issue that I have posted about in this
>>>> thread:
>>>>
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-NPE-with-Array-td19797.html
>>>> I think your main issue here is somewhat similar, in that the
>>>> MapWrapper Scala class is not registered. This gets registered by the
>>>> Twitter chill-scala AllScalaRegistrar class that you are currently not
>>>> using.
>>>>
>>>> As far as I understand, in order to use Avro with Spark, you also
>>>> have to use Kryo. This means you have to use the Spark KryoSerializer. This
>>>> in turn uses Twitter chill. I posted the basic code that I am using here:
>>>>
>>>>
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-read-this-avro-file-using-spark-amp-scala-td19400.html#a19491
>>>>
>>>> Maybe there is a simpler solution to your problem but I am not that
>>>> much of an expert yet. I hope this helps.
>>>>
>>>> Simone Franzini, PhD
>>>>
>>>> http://www.linkedin.com/in/simonefranzini
>>>>
>>>> On Tue, Dec 9, 2014 at 8:50 AM, Cristovao Jose Domingues Cordeiro <
>>>> cristovao.cordeiro@cern.ch> wrote:
>>>>
>>>>> Hi Simone,
>>>>>
>>>>> thanks but I don't think that's it.
>>>>> I've tried several libraries within the --jar argument. Some do give
>>>>> what you said. But other times (when I put the right version I guess) I get
>>>>> the following:
>>>>> 14/12/09 15:45:54 ERROR Executor: Exception in task 0.0 in stage 0.0
>>>>> (TID 0)
>>>>> java.io.NotSerializableException:
>>>>> scala.collection.convert.Wrappers$MapWrapper
>>>>> at
>>>>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
>>>>> at
>>>>> java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
>>>>>
>>>>>
>>>>> Which is odd since I am reading a Avro I wrote...with the same piece
>>>>> of code:
>>>>> https://gist.github.com/MLnick/5864741781b9340cb211
>>>>>
>>>>> Cumprimentos / Best regards,
>>>>> Cristóvão José Domingues Cordeiro
>>>>> IT Department - 28/R-018
>>>>> CERN
>>>>> ------------------------------
>>>>> *From:* Simone Franzini [captainfranz@gmail.com]
>>>>> *Sent:* 06 December 2014 15:48
>>>>> *To:* Cristovao Jose Domingues Cordeiro
>>>>> *Subject:* Re: NullPointerException When Reading Avro Sequence Files
>>>>>
>>>>> java.lang.IncompatibleClassChangeError: Found interface
>>>>> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
>>>>>
>>>>> That is a sign that you are mixing up versions of Hadoop. This is
>>>>> particularly an issue when dealing with AVRO. If you are using Hadoop 2,
>>>>> you will need to get the hadoop 2 version of avro-mapred. In Maven you
>>>>> would do this with the <classifier> hadoop2 </classifier> tag.
>>>>>
>>>>> Simone Franzini, PhD
>>>>>
>>>>> http://www.linkedin.com/in/simonefranzini
>>>>>
>>>>> On Fri, Dec 5, 2014 at 3:52 AM, cjdc <cr...@cern.ch>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I've tried the above example on Gist, but it doesn't work (at least
>>>>>> for me).
>>>>>> Did anyone get this:
>>>>>> 14/12/05 10:44:40 ERROR Executor: Exception in task 0.0 in stage 0.0
>>>>>> (TID 0)
>>>>>> java.lang.IncompatibleClassChangeError: Found interface
>>>>>> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
>>>>>> at
>>>>>>
>>>>>> org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
>>>>>> at
>>>>>>
>>>>>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:115)
>>>>>> at
>>>>>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:103)
>>>>>> at
>>>>>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
>>>>>> at
>>>>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>>>>> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>>>>> at
>>>>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>>>>> at
>>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>>>>>> at org.apache.spark.scheduler.Task.run(Task.scala:54)
>>>>>> at
>>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>>>>>> at
>>>>>>
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>> at
>>>>>>
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>> 14/12/05 10:44:40 ERROR ExecutorUncaughtExceptionHandler: Uncaught
>>>>>> exception
>>>>>> in thread Thread[Executor task launch worker-0,5,main]
>>>>>> java.lang.IncompatibleClassChangeError: Found interface
>>>>>> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
>>>>>> at
>>>>>>
>>>>>> org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
>>>>>> at
>>>>>>
>>>>>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:115)
>>>>>> at
>>>>>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:103)
>>>>>> at
>>>>>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
>>>>>> at
>>>>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>>>>> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>>>>> at
>>>>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>>>>> at
>>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>>>>>> at org.apache.spark.scheduler.Task.run(Task.scala:54)
>>>>>> at
>>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>>>>>> at
>>>>>>
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>> at
>>>>>>
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>> 14/12/05 10:44:40 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1
>>>>>> times;
>>>>>> aborting job
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-when-reading-Avro-Sequence-files-tp10201p20456.html
>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>> Nabble.com.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
Re: NullPointerException When Reading Avro Sequence Files
Posted by Simone Franzini <ca...@gmail.com>.
You can use this Maven dependency:
<dependency>
<groupId>com.twitter</groupId>
<artifactId>chill-avro</artifactId>
<version>0.4.0</version>
</dependency>
Simone Franzini, PhD
http://www.linkedin.com/in/simonefranzini
On Tue, Dec 9, 2014 at 9:53 AM, Cristovao Jose Domingues Cordeiro <
cristovao.cordeiro@cern.ch> wrote:
> Thanks for the reply!
>
> I've tried in fact your code. But I lack the twiter chill package and I
> can not find it online. So I am now trying this
> http://spark.apache.org/docs/latest/tuning.html#data-serialization . But
> in case I can't do it, could you tell me where to get that Twiter package
> you used?
>
> Thanks
>
> Cumprimentos / Best regards,
> Cristóvão José Domingues Cordeiro
> IT Department - 28/R-018
> CERN
> ------------------------------
> *From:* Simone Franzini [captainfranz@gmail.com]
> *Sent:* 09 December 2014 16:42
> *To:* Cristovao Jose Domingues Cordeiro; user
>
> *Subject:* Re: NullPointerException When Reading Avro Sequence Files
>
> Hi Cristovao,
>
> I have seen a very similar issue that I have posted about in this thread:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-NPE-with-Array-td19797.html
> I think your main issue here is somewhat similar, in that the MapWrapper
> Scala class is not registered. This gets registered by the Twitter
> chill-scala AllScalaRegistrar class that you are currently not using.
>
> As far as I understand, in order to use Avro with Spark, you also have
> to use Kryo. This means you have to use the Spark KryoSerializer. This in
> turn uses Twitter chill. I posted the basic code that I am using here:
>
>
> http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-read-this-avro-file-using-spark-amp-scala-td19400.html#a19491
>
> Maybe there is a simpler solution to your problem but I am not that much
> of an expert yet. I hope this helps.
>
> Simone Franzini, PhD
>
> http://www.linkedin.com/in/simonefranzini
>
> On Tue, Dec 9, 2014 at 8:50 AM, Cristovao Jose Domingues Cordeiro <
> cristovao.cordeiro@cern.ch> wrote:
>
>> Hi Simone,
>>
>> thanks but I don't think that's it.
>> I've tried several libraries within the --jar argument. Some do give what
>> you said. But other times (when I put the right version I guess) I get the
>> following:
>> 14/12/09 15:45:54 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID
>> 0)
>> java.io.NotSerializableException:
>> scala.collection.convert.Wrappers$MapWrapper
>> at
>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
>> at
>> java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
>>
>>
>> Which is odd since I am reading a Avro I wrote...with the same piece of
>> code:
>> https://gist.github.com/MLnick/5864741781b9340cb211
>>
>> Cumprimentos / Best regards,
>> Cristóvão José Domingues Cordeiro
>> IT Department - 28/R-018
>> CERN
>> ------------------------------
>> *From:* Simone Franzini [captainfranz@gmail.com]
>> *Sent:* 06 December 2014 15:48
>> *To:* Cristovao Jose Domingues Cordeiro
>> *Subject:* Re: NullPointerException When Reading Avro Sequence Files
>>
>> java.lang.IncompatibleClassChangeError: Found interface
>> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
>>
>> That is a sign that you are mixing up versions of Hadoop. This is
>> particularly an issue when dealing with AVRO. If you are using Hadoop 2,
>> you will need to get the hadoop 2 version of avro-mapred. In Maven you
>> would do this with the <classifier> hadoop2 </classifier> tag.
>>
>> Simone Franzini, PhD
>>
>> http://www.linkedin.com/in/simonefranzini
>>
>> On Fri, Dec 5, 2014 at 3:52 AM, cjdc <cr...@cern.ch> wrote:
>>
>>> Hi all,
>>>
>>> I've tried the above example on Gist, but it doesn't work (at least for
>>> me).
>>> Did anyone get this:
>>> 14/12/05 10:44:40 ERROR Executor: Exception in task 0.0 in stage 0.0
>>> (TID 0)
>>> java.lang.IncompatibleClassChangeError: Found interface
>>> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
>>> at
>>>
>>> org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
>>> at
>>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:115)
>>> at
>>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:103)
>>> at
>>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
>>> at
>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>> at
>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>> at
>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:54)
>>> at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>>> at
>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 14/12/05 10:44:40 ERROR ExecutorUncaughtExceptionHandler: Uncaught
>>> exception
>>> in thread Thread[Executor task launch worker-0,5,main]
>>> java.lang.IncompatibleClassChangeError: Found interface
>>> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
>>> at
>>>
>>> org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
>>> at
>>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:115)
>>> at
>>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:103)
>>> at
>>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
>>> at
>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>> at
>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>> at
>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:54)
>>> at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>>> at
>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 14/12/05 10:44:40 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1
>>> times;
>>> aborting job
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-when-reading-Avro-Sequence-files-tp10201p20456.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>
Re: NullPointerException When Reading Avro Sequence Files
Posted by Simone Franzini <ca...@gmail.com>.
Hi Cristovao,
I have seen a very similar issue that I have posted about in this thread:
http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-NPE-with-Array-td19797.html
I think your main issue here is somewhat similar, in that the MapWrapper
Scala class is not registered. This gets registered by the Twitter
chill-scala AllScalaRegistrar class that you are currently not using.
As far as I understand, in order to use Avro with Spark, you also have to
use Kryo. This means you have to use the Spark KryoSerializer. This in turn
uses Twitter chill. I posted the basic code that I am using here:
http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-read-this-avro-file-using-spark-amp-scala-td19400.html#a19491
Maybe there is a simpler solution to your problem but I am not that much of
an expert yet. I hope this helps.
Simone Franzini, PhD
http://www.linkedin.com/in/simonefranzini
On Tue, Dec 9, 2014 at 8:50 AM, Cristovao Jose Domingues Cordeiro <
cristovao.cordeiro@cern.ch> wrote:
> Hi Simone,
>
> thanks but I don't think that's it.
> I've tried several libraries within the --jar argument. Some do give what
> you said. But other times (when I put the right version I guess) I get the
> following:
> 14/12/09 15:45:54 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID
> 0)
> java.io.NotSerializableException:
> scala.collection.convert.Wrappers$MapWrapper
> at
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
> at
> java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
>
>
> Which is odd since I am reading a Avro I wrote...with the same piece of
> code:
> https://gist.github.com/MLnick/5864741781b9340cb211
>
> Cumprimentos / Best regards,
> Cristóvão José Domingues Cordeiro
> IT Department - 28/R-018
> CERN
> ------------------------------
> *From:* Simone Franzini [captainfranz@gmail.com]
> *Sent:* 06 December 2014 15:48
> *To:* Cristovao Jose Domingues Cordeiro
> *Subject:* Re: NullPointerException When Reading Avro Sequence Files
>
> java.lang.IncompatibleClassChangeError: Found interface
> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
>
> That is a sign that you are mixing up versions of Hadoop. This is
> particularly an issue when dealing with AVRO. If you are using Hadoop 2,
> you will need to get the hadoop 2 version of avro-mapred. In Maven you
> would do this with the <classifier> hadoop2 </classifier> tag.
>
> Simone Franzini, PhD
>
> http://www.linkedin.com/in/simonefranzini
>
> On Fri, Dec 5, 2014 at 3:52 AM, cjdc <cr...@cern.ch> wrote:
>
>> Hi all,
>>
>> I've tried the above example on Gist, but it doesn't work (at least for
>> me).
>> Did anyone get this:
>> 14/12/05 10:44:40 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID
>> 0)
>> java.lang.IncompatibleClassChangeError: Found interface
>> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
>> at
>>
>> org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
>> at
>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:115)
>> at
>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:103)
>> at
>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>> at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>> at org.apache.spark.scheduler.Task.run(Task.scala:54)
>> at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> 14/12/05 10:44:40 ERROR ExecutorUncaughtExceptionHandler: Uncaught
>> exception
>> in thread Thread[Executor task launch worker-0,5,main]
>> java.lang.IncompatibleClassChangeError: Found interface
>> org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
>> at
>>
>> org.apache.avro.mapreduce.AvroKeyInputFormat.createRecordReader(AvroKeyInputFormat.java:47)
>> at
>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:115)
>> at
>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:103)
>> at
>> org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:65)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>> at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>> at org.apache.spark.scheduler.Task.run(Task.scala:54)
>> at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> 14/12/05 10:44:40 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1
>> times;
>> aborting job
>>
>>
>> Thanks
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-when-reading-Avro-Sequence-files-tp10201p20456.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>