You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Catalin Alexandru Zamfir (JIRA)" <ji...@apache.org> on 2016/07/05 15:54:11 UTC
[jira] [Commented] (SPARK-15582) Support for Groovy closures

    [ https://issues.apache.org/jira/browse/SPARK-15582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362669#comment-15362669 ] 

Catalin Alexandru Zamfir commented on SPARK-15582:
--------------------------------------------------

"Open-sourced" the Groovy fix/work-around implementation @ https://github.com/Antauri/spark-groovy-support and explained what happens beneath the covers. Compatible with Spark 1.6.2 (requires the SPARK-13599 commit fix for Spark 1.6.1). 

Since the release of Spark 1.6.2 we've been able to use Groovy (with the above implementation) to write our analytics scripts with the official Spark distribution which is fantastic from a maintenance point of view.

> Support for Groovy closures
> ---------------------------
>
>                 Key: SPARK-15582
>                 URL: https://issues.apache.org/jira/browse/SPARK-15582
>             Project: Spark
>          Issue Type: Improvement
>          Components: Input/Output, Java API
>    Affects Versions: 1.6.1, 1.6.2, 2.0.0
>         Environment: 6 node Debian 8 based Spark cluster
>            Reporter: Catalin Alexandru Zamfir
>
> After fixing SPARK-13599 and running one of our jobs against this fix for Groovy dependencies (which indeed it fixed), we see the Spark executors stuck at a ClassNotFound exception when running as a Script (via GroovyShell.evalute (scriptText)). It seems Spark cannot de-serialize the closure, or the closure is not received by the executor.
> {noformat}
> sparkContext.binaryFiles (ourPath).flatMap ({ onePathEntry -> code-block } as FlatMapFunction).count ();
> { onePathEntry -> code-block } denotes a Groovy closure.
> {noformat}
> There is a groovy-spark example @ https://github.com/bunions1/groovy-spark-example ... However the above uses a modified Groovy. If my understanding is correct, Groovy compiles to native byte-code, which should be easy for Spark to pick-up and use closures.
> The above example code fails with this stack-trace:
> {noformat}
> Caused by: java.lang.ClassNotFoundException: Script1$_run_closure1
> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> 	at java.lang.Class.forName0(Native Method)
> 	at java.lang.Class.forName(Class.java:348)
> 	at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68)
> 	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
> 	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
> 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
> 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> 	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
> 	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
> 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> 	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
> 	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
> 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> 	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
> 	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
> 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> 	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
> 	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
> 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> 	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
> 	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
> 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> 	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
> 	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
> 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> 	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
> 	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
> 	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
> 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> 	at org.apache.spark.scheduler.Task.run(Task.scala:89)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214
> {noformat}
> Any ideas on how to tackle this, welcomed. I've tried Googling around for similar issues, but nobody has found a solution.
> At least, point me on where to "hack" to make Spark support closures and I'd share some of my time to make it work. There is SPARK-2171 arguing that support for this is out of the box, but for projects of a relative complex size where the driver application is contained/part-of a bigger application and running on a cluster, things do not seem to work. I don't know if SPARK-2171 has tried to run outside of a local[] cluster set-up where such issues can arise.
> I saw a couple of people trying to make it to work, but again, they look to work-arounds (eg. distribution of byte-code manually before needed, adding a JAR with addJar and other work-arounds).
> Can this be done? Where can we look?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org