You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Nick Pentreath <ni...@gmail.com> on 2014/04/08 16:50:27 UTC

NPE using saveAsTextFile

Hi

I'm using Spark 0.9.0.

When calling saveAsTextFile on a custom hadoop inputformat (loaded with
newAPIHadoopRDD), I get the following error below.

If I call count, I get the correct count of number of records, so the
inputformat is being read correctly... the issue only appears when trying
to use saveAsTextFile.

If I call first() I get the correct output, also. So it doesn't appear to
be anything with the data or inputformat.

Any idea what the actual problem is, since this stack trace is not obvious
(though it seems to be in ResultTask which ultimately causes this).

Is this a known issue at all?


======

14/04/08 16:00:46 ERROR OneForOneStrategy:
java.lang.NullPointerException
at
com.typesafe.config.impl.SerializedConfigValue.writeOrigin(SerializedConfigValue.java:202)
at
com.typesafe.config.impl.ConfigImplUtil.writeOrigin(ConfigImplUtil.java:228)
at com.typesafe.config.ConfigException.writeObject(ConfigException.java:58)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:975)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1480)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346)
at scala.collection.immutable.$colon$colon.writeObject(List.scala:379)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:975)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1480)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346)
at
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:28)
at org.apache.spark.scheduler.ResultTask$.serializeInfo(ResultTask.scala:48)
at org.apache.spark.scheduler.ResultTask.writeExternal(ResultTask.scala:123)
at
java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1443)
at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1414)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346)
at
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:28)
at
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:778)
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:724)
at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:554)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Re: NPE using saveAsTextFile

Posted by Nick Pentreath <ni...@gmail.com>.
There was a closure over the config object lurking around - but in any case
upgrading to 1.2.0 for config did the trick as it seems to have been a bug
in Typesafe config,

Thanks Matei!


On Thu, Apr 10, 2014 at 8:46 AM, Nick Pentreath <ni...@gmail.com>wrote:

> Ok I thought it may be closing over the config option. I am using config
> for job configuration, but extracting vals from that. So not sure why as I
> thought I'd avoided closing over it. Will go back to source and see where
> it is creeping in.
>
>
>
> On Thu, Apr 10, 2014 at 8:42 AM, Matei Zaharia <ma...@gmail.com>wrote:
>
>> I haven't seen this but it may be a bug in Typesafe Config, since this is
>> serializing a Config object. We don't actually use Typesafe Config
>> ourselves.
>>
>> Do you have any nulls in the data itself by any chance? And do you know
>> how that Config object is getting there?
>>
>> Matei
>>
>> On Apr 9, 2014, at 11:38 PM, Nick Pentreath <ni...@gmail.com>
>> wrote:
>>
>> Anyone have a chance to look at this?
>>
>> Am I just doing something silly somewhere?
>>
>> If it makes any difference, I am using the elasticsearch-hadoop plugin
>> for ESInputFormat. But as I say, I can parse the data (count, first() etc).
>> I just can't save it as text file.
>>
>>
>>
>>
>> On Tue, Apr 8, 2014 at 4:50 PM, Nick Pentreath <ni...@gmail.com>wrote:
>>
>>> Hi
>>>
>>> I'm using Spark 0.9.0.
>>>
>>> When calling saveAsTextFile on a custom hadoop inputformat (loaded with
>>> newAPIHadoopRDD), I get the following error below.
>>>
>>> If I call count, I get the correct count of number of records, so the
>>> inputformat is being read correctly... the issue only appears when trying
>>> to use saveAsTextFile.
>>>
>>> If I call first() I get the correct output, also. So it doesn't appear
>>> to be anything with the data or inputformat.
>>>
>>> Any idea what the actual problem is, since this stack trace is not
>>> obvious (though it seems to be in ResultTask which ultimately causes this).
>>>
>>> Is this a known issue at all?
>>>
>>>
>>> ======
>>>
>>> 14/04/08 16:00:46 ERROR OneForOneStrategy:
>>> java.lang.NullPointerException
>>>  at
>>> com.typesafe.config.impl.SerializedConfigValue.writeOrigin(SerializedConfigValue.java:202)
>>> at
>>> com.typesafe.config.impl.ConfigImplUtil.writeOrigin(ConfigImplUtil.java:228)
>>>  at
>>> com.typesafe.config.ConfigException.writeObject(ConfigException.java:58)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>  at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>  at java.lang.reflect.Method.invoke(Method.java:601)
>>> at
>>> java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:975)
>>>  at
>>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1480)
>>> at
>>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>>>  at
>>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
>>> at
>>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
>>>  at
>>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
>>> at
>>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>>>  at
>>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
>>> at
>>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
>>>  at
>>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
>>> at
>>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>>>  at
>>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
>>> at
>>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
>>>  at
>>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
>>> at
>>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>>>  at
>>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
>>> at
>>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
>>>  at
>>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
>>> at
>>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>>>  at
>>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
>>> at
>>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
>>>  at
>>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
>>> at
>>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>>>  at
>>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
>>> at
>>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
>>>  at
>>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
>>> at
>>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>>>  at
>>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
>>> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346)
>>>  at scala.collection.immutable.$colon$colon.writeObject(List.scala:379)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>  at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>  at java.lang.reflect.Method.invoke(Method.java:601)
>>> at
>>> java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:975)
>>>  at
>>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1480)
>>> at
>>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>>>  at
>>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
>>> at
>>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
>>>  at
>>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
>>> at
>>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>>>  at
>>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
>>> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346)
>>>  at
>>> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:28)
>>> at
>>> org.apache.spark.scheduler.ResultTask$.serializeInfo(ResultTask.scala:48)
>>>  at
>>> org.apache.spark.scheduler.ResultTask.writeExternal(ResultTask.scala:123)
>>> at
>>> java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1443)
>>>  at
>>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1414)
>>> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
>>>  at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346)
>>> at
>>> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:28)
>>>  at
>>> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:48)
>>> at org.apache.spark.scheduler.DAGScheduler.org<http://org.apache.spark.scheduler.dagscheduler.org/>
>>> $apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:778)
>>>  at org.apache.spark.scheduler.DAGScheduler.org<http://org.apache.spark.scheduler.dagscheduler.org/>
>>> $apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:724)
>>>  at
>>> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:554)
>>> at
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
>>>  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>>  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>>  at
>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>  at
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>> at
>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>  at
>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>
>>>
>>
>>
>

Re: NPE using saveAsTextFile

Posted by Nick Pentreath <ni...@gmail.com>.
Ok I thought it may be closing over the config option. I am using config
for job configuration, but extracting vals from that. So not sure why as I
thought I'd avoided closing over it. Will go back to source and see where
it is creeping in.



On Thu, Apr 10, 2014 at 8:42 AM, Matei Zaharia <ma...@gmail.com>wrote:

> I haven't seen this but it may be a bug in Typesafe Config, since this is
> serializing a Config object. We don't actually use Typesafe Config
> ourselves.
>
> Do you have any nulls in the data itself by any chance? And do you know
> how that Config object is getting there?
>
> Matei
>
> On Apr 9, 2014, at 11:38 PM, Nick Pentreath <ni...@gmail.com>
> wrote:
>
> Anyone have a chance to look at this?
>
> Am I just doing something silly somewhere?
>
> If it makes any difference, I am using the elasticsearch-hadoop plugin for
> ESInputFormat. But as I say, I can parse the data (count, first() etc). I
> just can't save it as text file.
>
>
>
>
> On Tue, Apr 8, 2014 at 4:50 PM, Nick Pentreath <ni...@gmail.com>wrote:
>
>> Hi
>>
>> I'm using Spark 0.9.0.
>>
>> When calling saveAsTextFile on a custom hadoop inputformat (loaded with
>> newAPIHadoopRDD), I get the following error below.
>>
>> If I call count, I get the correct count of number of records, so the
>> inputformat is being read correctly... the issue only appears when trying
>> to use saveAsTextFile.
>>
>> If I call first() I get the correct output, also. So it doesn't appear to
>> be anything with the data or inputformat.
>>
>> Any idea what the actual problem is, since this stack trace is not
>> obvious (though it seems to be in ResultTask which ultimately causes this).
>>
>> Is this a known issue at all?
>>
>>
>> ======
>>
>> 14/04/08 16:00:46 ERROR OneForOneStrategy:
>> java.lang.NullPointerException
>>  at
>> com.typesafe.config.impl.SerializedConfigValue.writeOrigin(SerializedConfigValue.java:202)
>> at
>> com.typesafe.config.impl.ConfigImplUtil.writeOrigin(ConfigImplUtil.java:228)
>>  at
>> com.typesafe.config.ConfigException.writeObject(ConfigException.java:58)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>  at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>  at java.lang.reflect.Method.invoke(Method.java:601)
>> at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:975)
>>  at
>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1480)
>> at
>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
>> at
>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
>>  at
>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
>> at
>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
>> at
>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
>>  at
>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
>> at
>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
>> at
>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
>>  at
>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
>> at
>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
>> at
>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
>>  at
>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
>> at
>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
>> at
>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
>>  at
>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
>> at
>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
>> at
>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
>>  at
>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
>> at
>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
>> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346)
>>  at scala.collection.immutable.$colon$colon.writeObject(List.scala:379)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>  at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>  at java.lang.reflect.Method.invoke(Method.java:601)
>> at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:975)
>>  at
>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1480)
>> at
>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
>> at
>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
>>  at
>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
>> at
>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
>> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346)
>>  at
>> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:28)
>> at
>> org.apache.spark.scheduler.ResultTask$.serializeInfo(ResultTask.scala:48)
>>  at
>> org.apache.spark.scheduler.ResultTask.writeExternal(ResultTask.scala:123)
>> at
>> java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1443)
>>  at
>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1414)
>> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
>>  at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346)
>> at
>> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:28)
>>  at
>> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:48)
>> at org.apache.spark.scheduler.DAGScheduler.org<http://org.apache.spark.scheduler.dagscheduler.org/>
>> $apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:778)
>>  at org.apache.spark.scheduler.DAGScheduler.org<http://org.apache.spark.scheduler.dagscheduler.org/>
>> $apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:724)
>>  at
>> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:554)
>> at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
>>  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>  at
>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>  at
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>> at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>  at
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>
>>
>
>

Re: NPE using saveAsTextFile

Posted by Matei Zaharia <ma...@gmail.com>.
I haven’t seen this but it may be a bug in Typesafe Config, since this is serializing a Config object. We don’t actually use Typesafe Config ourselves.

Do you have any nulls in the data itself by any chance? And do you know how that Config object is getting there?

Matei

On Apr 9, 2014, at 11:38 PM, Nick Pentreath <ni...@gmail.com> wrote:

> Anyone have a chance to look at this?
> 
> Am I just doing something silly somewhere?
> 
> If it makes any difference, I am using the elasticsearch-hadoop plugin for ESInputFormat. But as I say, I can parse the data (count, first() etc). I just can't save it as text file.
> 
> 
> 
> 
> On Tue, Apr 8, 2014 at 4:50 PM, Nick Pentreath <ni...@gmail.com> wrote:
> Hi
> 
> I'm using Spark 0.9.0.
> 
> When calling saveAsTextFile on a custom hadoop inputformat (loaded with newAPIHadoopRDD), I get the following error below.
> 
> If I call count, I get the correct count of number of records, so the inputformat is being read correctly... the issue only appears when trying to use saveAsTextFile.
> 
> If I call first() I get the correct output, also. So it doesn't appear to be anything with the data or inputformat.
> 
> Any idea what the actual problem is, since this stack trace is not obvious (though it seems to be in ResultTask which ultimately causes this).
> 
> Is this a known issue at all?
> 
> 
> ======
> 
> 14/04/08 16:00:46 ERROR OneForOneStrategy: 
> java.lang.NullPointerException
> 	at com.typesafe.config.impl.SerializedConfigValue.writeOrigin(SerializedConfigValue.java:202)
> 	at com.typesafe.config.impl.ConfigImplUtil.writeOrigin(ConfigImplUtil.java:228)
> 	at com.typesafe.config.ConfigException.writeObject(ConfigException.java:58)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:601)
> 	at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:975)
> 	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1480)
> 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
> 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> 	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
> 	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
> 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
> 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> 	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
> 	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
> 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
> 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> 	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
> 	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
> 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
> 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> 	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
> 	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
> 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
> 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> 	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
> 	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
> 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
> 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> 	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
> 	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
> 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
> 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> 	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346)
> 	at scala.collection.immutable.$colon$colon.writeObject(List.scala:379)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:601)
> 	at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:975)
> 	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1480)
> 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
> 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> 	at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
> 	at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
> 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
> 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> 	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346)
> 	at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:28)
> 	at org.apache.spark.scheduler.ResultTask$.serializeInfo(ResultTask.scala:48)
> 	at org.apache.spark.scheduler.ResultTask.writeExternal(ResultTask.scala:123)
> 	at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1443)
> 	at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1414)
> 	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> 	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346)
> 	at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:28)
> 	at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:48)
> 	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:778)
> 	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:724)
> 	at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:554)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
> 	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> 	at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> 	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> 	at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> 	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> 	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> 	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> 	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> 	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 
> 


Re: NPE using saveAsTextFile

Posted by Nick Pentreath <ni...@gmail.com>.
Anyone have a chance to look at this?

Am I just doing something silly somewhere?

If it makes any difference, I am using the elasticsearch-hadoop plugin for
ESInputFormat. But as I say, I can parse the data (count, first() etc). I
just can't save it as text file.




On Tue, Apr 8, 2014 at 4:50 PM, Nick Pentreath <ni...@gmail.com>wrote:

> Hi
>
> I'm using Spark 0.9.0.
>
> When calling saveAsTextFile on a custom hadoop inputformat (loaded with
> newAPIHadoopRDD), I get the following error below.
>
> If I call count, I get the correct count of number of records, so the
> inputformat is being read correctly... the issue only appears when trying
> to use saveAsTextFile.
>
> If I call first() I get the correct output, also. So it doesn't appear to
> be anything with the data or inputformat.
>
> Any idea what the actual problem is, since this stack trace is not obvious
> (though it seems to be in ResultTask which ultimately causes this).
>
> Is this a known issue at all?
>
>
> ======
>
> 14/04/08 16:00:46 ERROR OneForOneStrategy:
> java.lang.NullPointerException
>  at
> com.typesafe.config.impl.SerializedConfigValue.writeOrigin(SerializedConfigValue.java:202)
> at
> com.typesafe.config.impl.ConfigImplUtil.writeOrigin(ConfigImplUtil.java:228)
>  at
> com.typesafe.config.ConfigException.writeObject(ConfigException.java:58)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:601)
> at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:975)
>  at
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1480)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
>  at
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
>  at
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
>  at
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
>  at
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
>  at
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
>  at
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346)
>  at scala.collection.immutable.$colon$colon.writeObject(List.scala:379)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:601)
> at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:975)
>  at
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1480)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)
>  at
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1493)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1416)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346)
>  at
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:28)
> at
> org.apache.spark.scheduler.ResultTask$.serializeInfo(ResultTask.scala:48)
>  at
> org.apache.spark.scheduler.ResultTask.writeExternal(ResultTask.scala:123)
> at
> java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1443)
>  at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1414)
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
>  at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346)
> at
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:28)
>  at
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:48)
> at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:778)
>  at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:724)
>  at
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:554)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
>  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>  at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>  at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>  at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
>