You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@groovy.apache.org by tog <gu...@gmail.com> on 2015/07/26 11:12:24 UTC
Apache Spark & Groovy
Hi
I am starting to play with Apache Spark using groovy. I have a small script
<https://gist.github.com/galleon/d6540327c418aa8a479f> that I use for that
purpose.
When the script is transformed in a class and launched with java, this is
working fine but it fails when run as a script.
Any idea what I am doing wrong ? May be some of you have already come
accros that problem.
$ groovy -version
Groovy Version: 2.4.3 JVM: 1.8.0_40 Vendor: Oracle Corporation OS: Mac OS X
$ groovy GroovySparkWordcount.groovy
class org.apache.spark.api.java.JavaRDD
true
true
Caught: org.apache.spark.SparkException: Task not serializable
org.apache.spark.SparkException: Task not serializable
at
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:315)
at
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:305)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:132)
at org.apache.spark.SparkContext.clean(SparkContext.scala:1893)
at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:311)
at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:310)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
at org.apache.spark.rdd.RDD.filter(RDD.scala:310)
at org.apache.spark.api.java.JavaRDD.filter(JavaRDD.scala:78)
at org.apache.spark.api.java.JavaRDD$filter$0.call(Unknown Source)
at GroovySparkWordcount.run(GroovySparkWordcount.groovy:27)
Caused by: java.io.NotSerializableException: GroovySparkWordcount
Serialization stack:
- object not serializable (class: GroovySparkWordcount, value:
GroovySparkWordcount@57c6feea)
- field (class: GroovySparkWordcount$1, name: this$0, type: class
GroovySparkWordcount)
- object (class GroovySparkWordcount$1, GroovySparkWordcount$1@3db1ce78)
- field (class: org.apache.spark.api.java.JavaRDD$$anonfun$filter$1, name:
f$1, type: interface org.apache.spark.api.java.function.Function)
- object (class org.apache.spark.api.java.JavaRDD$$anonfun$filter$1,
<function1>)
at
org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
at
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81)
at
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:312)
... 12 more
Re: Apache Spark & Groovy
Posted by tog <gu...@gmail.com>.
Thanks Cedric, I learnt something :-) and it solved my issue.
Few additional questions then:
In my script, should Serializable.isAssignableFrom(filterClosure.class)
returns true only when I call dehydrate on it ? (this is not the case ?)
Would there be a way to automatically create "dehydrated" closure in a
script ?
or should I catch all calls to map on JavaRDD to make sure the closure is
dehydrated before calling the actual method ?
On 26 July 2015 at 11:07, Cédric Champeau <ce...@gmail.com> wrote:
> A closure keeps a reference to its owner/thisObject, which is in your
> case the script. The script is not serializable. If you dehydrate the
> closure (call closure.dehydrate()) it will not keep a reference to the
> script anymore and it should be serializable.
>
> 2015-07-26 11:57 GMT+02:00 Jeff MAURY <je...@jeffmaury.com>:
> > So it may be an object stored in your task that is not
> >
> > Jeff
> >
> > Le 26 juil. 2015 11:42, "tog" <gu...@gmail.com> a écrit :
> >>
> >> Thanks Jeff for your quick answer.
> >>
> >> Yes, the tasks shall be serializable and I believe they are.
> >>
> >> My test script has 2 tasks (doing the same job) one is a closure, the
> >> other is a org.apache.spark.api.java.function.Function - and according
> to a
> >> small test in my script both are serializable for Java/Groovy.
> >>
> >> I am a bit puzzled/stuck here.
> >>
> >> On 26 July 2015 at 10:34, Jeff MAURY <je...@jeffmaury.com> wrote:
> >>>
> >>> Spark is distribution tasks on cluster nodes so the task needs to be
> >>> serializable. Appears that you task is a Groovy closure so you must
> make it
> >>> serializable.
> >>>
> >>> Jeff
> >>>
> >>> On Sun, Jul 26, 2015 at 11:12 AM, tog <gu...@gmail.com>
> wrote:
> >>>>
> >>>> Hi
> >>>>
> >>>> I am starting to play with Apache Spark using groovy. I have a small
> >>>> script that I use for that purpose.
> >>>>
> >>>> When the script is transformed in a class and launched with java, this
> >>>> is working fine but it fails when run as a script.
> >>>>
> >>>> Any idea what I am doing wrong ? May be some of you have already come
> >>>> accros that problem.
> >>>>
> >>>> $ groovy -version
> >>>>
> >>>> Groovy Version: 2.4.3 JVM: 1.8.0_40 Vendor: Oracle Corporation OS: Mac
> >>>> OS X
> >>>>
> >>>> $ groovy GroovySparkWordcount.groovy
> >>>>
> >>>> class org.apache.spark.api.java.JavaRDD
> >>>>
> >>>> true
> >>>>
> >>>> true
> >>>>
> >>>> Caught: org.apache.spark.SparkException: Task not serializable
> >>>>
> >>>> org.apache.spark.SparkException: Task not serializable
> >>>>
> >>>> at
> >>>>
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:315)
> >>>>
> >>>> at
> >>>>
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:305)
> >>>>
> >>>> at
> org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:132)
> >>>>
> >>>> at org.apache.spark.SparkContext.clean(SparkContext.scala:1893)
> >>>>
> >>>> at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:311)
> >>>>
> >>>> at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:310)
> >>>>
> >>>> at
> >>>>
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
> >>>>
> >>>> at
> >>>>
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
> >>>>
> >>>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
> >>>>
> >>>> at org.apache.spark.rdd.RDD.filter(RDD.scala:310)
> >>>>
> >>>> at org.apache.spark.api.java.JavaRDD.filter(JavaRDD.scala:78)
> >>>>
> >>>> at org.apache.spark.api.java.JavaRDD$filter$0.call(Unknown Source)
> >>>>
> >>>> at GroovySparkWordcount.run(GroovySparkWordcount.groovy:27)
> >>>>
> >>>> Caused by: java.io.NotSerializableException: GroovySparkWordcount
> >>>>
> >>>> Serialization stack:
> >>>>
> >>>> - object not serializable (class: GroovySparkWordcount, value:
> >>>> GroovySparkWordcount@57c6feea)
> >>>>
> >>>> - field (class: GroovySparkWordcount$1, name: this$0, type: class
> >>>> GroovySparkWordcount)
> >>>>
> >>>> - object (class GroovySparkWordcount$1,
> GroovySparkWordcount$1@3db1ce78)
> >>>>
> >>>> - field (class: org.apache.spark.api.java.JavaRDD$$anonfun$filter$1,
> >>>> name: f$1, type: interface
> org.apache.spark.api.java.function.Function)
> >>>>
> >>>> - object (class org.apache.spark.api.java.JavaRDD$$anonfun$filter$1,
> >>>> <function1>)
> >>>>
> >>>> at
> >>>>
> org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
> >>>>
> >>>> at
> >>>>
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
> >>>>
> >>>> at
> >>>>
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81)
> >>>>
> >>>> at
> >>>>
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:312)
> >>>>
> >>>> ... 12 more
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Jeff MAURY
> >>>
> >>>
> >>> "Legacy code" often differs from its suggested alternative by actually
> >>> working and scaling.
> >>> - Bjarne Stroustrup
> >>>
> >>> http://www.jeffmaury.com
> >>> http://riadiscuss.jeffmaury.com
> >>> http://www.twitter.com/jeffmaury
> >>
> >>
> >>
> >>
> >> --
> >> PGP KeyID: 2048R/EA31CFC9 subkeys.pgp.net
>
--
PGP KeyID: 2048R/EA31CFC9 subkeys.pgp.net
Re: Apache Spark & Groovy
Posted by Cédric Champeau <ce...@gmail.com>.
A closure keeps a reference to its owner/thisObject, which is in your
case the script. The script is not serializable. If you dehydrate the
closure (call closure.dehydrate()) it will not keep a reference to the
script anymore and it should be serializable.
2015-07-26 11:57 GMT+02:00 Jeff MAURY <je...@jeffmaury.com>:
> So it may be an object stored in your task that is not
>
> Jeff
>
> Le 26 juil. 2015 11:42, "tog" <gu...@gmail.com> a écrit :
>>
>> Thanks Jeff for your quick answer.
>>
>> Yes, the tasks shall be serializable and I believe they are.
>>
>> My test script has 2 tasks (doing the same job) one is a closure, the
>> other is a org.apache.spark.api.java.function.Function - and according to a
>> small test in my script both are serializable for Java/Groovy.
>>
>> I am a bit puzzled/stuck here.
>>
>> On 26 July 2015 at 10:34, Jeff MAURY <je...@jeffmaury.com> wrote:
>>>
>>> Spark is distribution tasks on cluster nodes so the task needs to be
>>> serializable. Appears that you task is a Groovy closure so you must make it
>>> serializable.
>>>
>>> Jeff
>>>
>>> On Sun, Jul 26, 2015 at 11:12 AM, tog <gu...@gmail.com> wrote:
>>>>
>>>> Hi
>>>>
>>>> I am starting to play with Apache Spark using groovy. I have a small
>>>> script that I use for that purpose.
>>>>
>>>> When the script is transformed in a class and launched with java, this
>>>> is working fine but it fails when run as a script.
>>>>
>>>> Any idea what I am doing wrong ? May be some of you have already come
>>>> accros that problem.
>>>>
>>>> $ groovy -version
>>>>
>>>> Groovy Version: 2.4.3 JVM: 1.8.0_40 Vendor: Oracle Corporation OS: Mac
>>>> OS X
>>>>
>>>> $ groovy GroovySparkWordcount.groovy
>>>>
>>>> class org.apache.spark.api.java.JavaRDD
>>>>
>>>> true
>>>>
>>>> true
>>>>
>>>> Caught: org.apache.spark.SparkException: Task not serializable
>>>>
>>>> org.apache.spark.SparkException: Task not serializable
>>>>
>>>> at
>>>> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:315)
>>>>
>>>> at
>>>> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:305)
>>>>
>>>> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:132)
>>>>
>>>> at org.apache.spark.SparkContext.clean(SparkContext.scala:1893)
>>>>
>>>> at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:311)
>>>>
>>>> at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:310)
>>>>
>>>> at
>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>>>>
>>>> at
>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>>>>
>>>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
>>>>
>>>> at org.apache.spark.rdd.RDD.filter(RDD.scala:310)
>>>>
>>>> at org.apache.spark.api.java.JavaRDD.filter(JavaRDD.scala:78)
>>>>
>>>> at org.apache.spark.api.java.JavaRDD$filter$0.call(Unknown Source)
>>>>
>>>> at GroovySparkWordcount.run(GroovySparkWordcount.groovy:27)
>>>>
>>>> Caused by: java.io.NotSerializableException: GroovySparkWordcount
>>>>
>>>> Serialization stack:
>>>>
>>>> - object not serializable (class: GroovySparkWordcount, value:
>>>> GroovySparkWordcount@57c6feea)
>>>>
>>>> - field (class: GroovySparkWordcount$1, name: this$0, type: class
>>>> GroovySparkWordcount)
>>>>
>>>> - object (class GroovySparkWordcount$1, GroovySparkWordcount$1@3db1ce78)
>>>>
>>>> - field (class: org.apache.spark.api.java.JavaRDD$$anonfun$filter$1,
>>>> name: f$1, type: interface org.apache.spark.api.java.function.Function)
>>>>
>>>> - object (class org.apache.spark.api.java.JavaRDD$$anonfun$filter$1,
>>>> <function1>)
>>>>
>>>> at
>>>> org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
>>>>
>>>> at
>>>> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
>>>>
>>>> at
>>>> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81)
>>>>
>>>> at
>>>> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:312)
>>>>
>>>> ... 12 more
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Jeff MAURY
>>>
>>>
>>> "Legacy code" often differs from its suggested alternative by actually
>>> working and scaling.
>>> - Bjarne Stroustrup
>>>
>>> http://www.jeffmaury.com
>>> http://riadiscuss.jeffmaury.com
>>> http://www.twitter.com/jeffmaury
>>
>>
>>
>>
>> --
>> PGP KeyID: 2048R/EA31CFC9 subkeys.pgp.net
Re: Apache Spark & Groovy
Posted by Jeff MAURY <je...@jeffmaury.com>.
So it may be an object stored in your task that is not
Jeff
Le 26 juil. 2015 11:42, "tog" <gu...@gmail.com> a écrit :
> Thanks Jeff for your quick answer.
>
> Yes, the tasks shall be serializable and I believe they are.
>
> My test script has 2 tasks (doing the same job) one is a closure, the
> other is a org.apache.spark.api.java.function.Function - and according to
> a small test in my script both are serializable for Java/Groovy.
>
> I am a bit puzzled/stuck here.
>
> On 26 July 2015 at 10:34, Jeff MAURY <je...@jeffmaury.com> wrote:
>
>> Spark is distribution tasks on cluster nodes so the task needs to be
>> serializable. Appears that you task is a Groovy closure so you must make it
>> serializable.
>>
>> Jeff
>>
>> On Sun, Jul 26, 2015 at 11:12 AM, tog <gu...@gmail.com> wrote:
>>
>>> Hi
>>>
>>> I am starting to play with Apache Spark using groovy. I have a small
>>> script <https://gist.github.com/galleon/d6540327c418aa8a479f> that I
>>> use for that purpose.
>>>
>>> When the script is transformed in a class and launched with java, this
>>> is working fine but it fails when run as a script.
>>>
>>> Any idea what I am doing wrong ? May be some of you have already come
>>> accros that problem.
>>>
>>> $ groovy -version
>>>
>>> Groovy Version: 2.4.3 JVM: 1.8.0_40 Vendor: Oracle Corporation OS: Mac
>>> OS X
>>>
>>> $ groovy GroovySparkWordcount.groovy
>>>
>>> class org.apache.spark.api.java.JavaRDD
>>>
>>> true
>>>
>>> true
>>>
>>> Caught: org.apache.spark.SparkException: Task not serializable
>>>
>>> org.apache.spark.SparkException: Task not serializable
>>>
>>> at
>>> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:315)
>>>
>>> at
>>> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:305)
>>>
>>> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:132)
>>>
>>> at org.apache.spark.SparkContext.clean(SparkContext.scala:1893)
>>>
>>> at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:311)
>>>
>>> at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:310)
>>>
>>> at
>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>>>
>>> at
>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>>>
>>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
>>>
>>> at org.apache.spark.rdd.RDD.filter(RDD.scala:310)
>>>
>>> at org.apache.spark.api.java.JavaRDD.filter(JavaRDD.scala:78)
>>>
>>> at org.apache.spark.api.java.JavaRDD$filter$0.call(Unknown Source)
>>>
>>> at GroovySparkWordcount.run(GroovySparkWordcount.groovy:27)
>>>
>>> Caused by: java.io.NotSerializableException: GroovySparkWordcount
>>>
>>> Serialization stack:
>>>
>>> - object not serializable (class: GroovySparkWordcount, value:
>>> GroovySparkWordcount@57c6feea)
>>>
>>> - field (class: GroovySparkWordcount$1, name: this$0, type: class
>>> GroovySparkWordcount)
>>>
>>> - object (class GroovySparkWordcount$1, GroovySparkWordcount$1@3db1ce78)
>>>
>>> - field (class: org.apache.spark.api.java.JavaRDD$$anonfun$filter$1,
>>> name: f$1, type: interface org.apache.spark.api.java.function.Function)
>>>
>>> - object (class org.apache.spark.api.java.JavaRDD$$anonfun$filter$1,
>>> <function1>)
>>>
>>> at
>>> org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
>>>
>>> at
>>> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
>>>
>>> at
>>> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81)
>>>
>>> at
>>> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:312)
>>>
>>> ... 12 more
>>>
>>>
>>>
>>
>>
>> --
>> Jeff MAURY
>>
>>
>> "Legacy code" often differs from its suggested alternative by actually
>> working and scaling.
>> - Bjarne Stroustrup
>>
>> http://www.jeffmaury.com
>> http://riadiscuss.jeffmaury.com
>> http://www.twitter.com/jeffmaury
>>
>
>
>
> --
> PGP KeyID: 2048R/EA31CFC9 subkeys.pgp.net
>
Re: Apache Spark & Groovy
Posted by tog <gu...@gmail.com>.
Thanks Jeff for your quick answer.
Yes, the tasks shall be serializable and I believe they are.
My test script has 2 tasks (doing the same job) one is a closure, the other
is a org.apache.spark.api.java.function.Function - and according to a small
test in my script both are serializable for Java/Groovy.
I am a bit puzzled/stuck here.
On 26 July 2015 at 10:34, Jeff MAURY <je...@jeffmaury.com> wrote:
> Spark is distribution tasks on cluster nodes so the task needs to be
> serializable. Appears that you task is a Groovy closure so you must make it
> serializable.
>
> Jeff
>
> On Sun, Jul 26, 2015 at 11:12 AM, tog <gu...@gmail.com> wrote:
>
>> Hi
>>
>> I am starting to play with Apache Spark using groovy. I have a small
>> script <https://gist.github.com/galleon/d6540327c418aa8a479f> that I use
>> for that purpose.
>>
>> When the script is transformed in a class and launched with java, this is
>> working fine but it fails when run as a script.
>>
>> Any idea what I am doing wrong ? May be some of you have already come
>> accros that problem.
>>
>> $ groovy -version
>>
>> Groovy Version: 2.4.3 JVM: 1.8.0_40 Vendor: Oracle Corporation OS: Mac OS
>> X
>>
>> $ groovy GroovySparkWordcount.groovy
>>
>> class org.apache.spark.api.java.JavaRDD
>>
>> true
>>
>> true
>>
>> Caught: org.apache.spark.SparkException: Task not serializable
>>
>> org.apache.spark.SparkException: Task not serializable
>>
>> at
>> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:315)
>>
>> at
>> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:305)
>>
>> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:132)
>>
>> at org.apache.spark.SparkContext.clean(SparkContext.scala:1893)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:311)
>>
>> at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:310)
>>
>> at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>>
>> at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>>
>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
>>
>> at org.apache.spark.rdd.RDD.filter(RDD.scala:310)
>>
>> at org.apache.spark.api.java.JavaRDD.filter(JavaRDD.scala:78)
>>
>> at org.apache.spark.api.java.JavaRDD$filter$0.call(Unknown Source)
>>
>> at GroovySparkWordcount.run(GroovySparkWordcount.groovy:27)
>>
>> Caused by: java.io.NotSerializableException: GroovySparkWordcount
>>
>> Serialization stack:
>>
>> - object not serializable (class: GroovySparkWordcount, value:
>> GroovySparkWordcount@57c6feea)
>>
>> - field (class: GroovySparkWordcount$1, name: this$0, type: class
>> GroovySparkWordcount)
>>
>> - object (class GroovySparkWordcount$1, GroovySparkWordcount$1@3db1ce78)
>>
>> - field (class: org.apache.spark.api.java.JavaRDD$$anonfun$filter$1,
>> name: f$1, type: interface org.apache.spark.api.java.function.Function)
>>
>> - object (class org.apache.spark.api.java.JavaRDD$$anonfun$filter$1,
>> <function1>)
>>
>> at
>> org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
>>
>> at
>> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
>>
>> at
>> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81)
>>
>> at
>> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:312)
>>
>> ... 12 more
>>
>>
>>
>
>
> --
> Jeff MAURY
>
>
> "Legacy code" often differs from its suggested alternative by actually
> working and scaling.
> - Bjarne Stroustrup
>
> http://www.jeffmaury.com
> http://riadiscuss.jeffmaury.com
> http://www.twitter.com/jeffmaury
>
--
PGP KeyID: 2048R/EA31CFC9 subkeys.pgp.net
Re: Apache Spark & Groovy
Posted by Jeff MAURY <je...@jeffmaury.com>.
Spark is distribution tasks on cluster nodes so the task needs to be
serializable. Appears that you task is a Groovy closure so you must make it
serializable.
Jeff
On Sun, Jul 26, 2015 at 11:12 AM, tog <gu...@gmail.com> wrote:
> Hi
>
> I am starting to play with Apache Spark using groovy. I have a small
> script <https://gist.github.com/galleon/d6540327c418aa8a479f> that I use
> for that purpose.
>
> When the script is transformed in a class and launched with java, this is
> working fine but it fails when run as a script.
>
> Any idea what I am doing wrong ? May be some of you have already come
> accros that problem.
>
> $ groovy -version
>
> Groovy Version: 2.4.3 JVM: 1.8.0_40 Vendor: Oracle Corporation OS: Mac OS X
>
> $ groovy GroovySparkWordcount.groovy
>
> class org.apache.spark.api.java.JavaRDD
>
> true
>
> true
>
> Caught: org.apache.spark.SparkException: Task not serializable
>
> org.apache.spark.SparkException: Task not serializable
>
> at
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:315)
>
> at
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:305)
>
> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:132)
>
> at org.apache.spark.SparkContext.clean(SparkContext.scala:1893)
>
> at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:311)
>
> at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:310)
>
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
>
> at org.apache.spark.rdd.RDD.filter(RDD.scala:310)
>
> at org.apache.spark.api.java.JavaRDD.filter(JavaRDD.scala:78)
>
> at org.apache.spark.api.java.JavaRDD$filter$0.call(Unknown Source)
>
> at GroovySparkWordcount.run(GroovySparkWordcount.groovy:27)
>
> Caused by: java.io.NotSerializableException: GroovySparkWordcount
>
> Serialization stack:
>
> - object not serializable (class: GroovySparkWordcount, value:
> GroovySparkWordcount@57c6feea)
>
> - field (class: GroovySparkWordcount$1, name: this$0, type: class
> GroovySparkWordcount)
>
> - object (class GroovySparkWordcount$1, GroovySparkWordcount$1@3db1ce78)
>
> - field (class: org.apache.spark.api.java.JavaRDD$$anonfun$filter$1, name:
> f$1, type: interface org.apache.spark.api.java.function.Function)
>
> - object (class org.apache.spark.api.java.JavaRDD$$anonfun$filter$1,
> <function1>)
>
> at
> org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
>
> at
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
>
> at
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81)
>
> at
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:312)
>
> ... 12 more
>
>
>
--
Jeff MAURY
"Legacy code" often differs from its suggested alternative by actually
working and scaling.
- Bjarne Stroustrup
http://www.jeffmaury.com
http://riadiscuss.jeffmaury.com
http://www.twitter.com/jeffmaury