You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Marius Van Niekerk (JIRA)" <ji...@apache.org> on 2016/11/30 03:40:58 UTC

[jira] [Commented] (SPARK-15369) Investigate selectively using Jython for parts of PySpark

    [ https://issues.apache.org/jira/browse/SPARK-15369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707411#comment-15707411 ] 

Marius Van Niekerk commented on SPARK-15369:
--------------------------------------------

I'm in the process of an initial stab at turning this into a spark package.

https://github.com/mariusvniekerk/spark-jython-udf

Feedback would be appreciated.

> Investigate selectively using Jython for parts of PySpark
> ---------------------------------------------------------
>
>                 Key: SPARK-15369
>                 URL: https://issues.apache.org/jira/browse/SPARK-15369
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>            Reporter: holdenk
>            Priority: Minor
>
> Transferring data from the JVM to the Python executor can be a substantial bottleneck. While Jython is not suitable for all UDFs or map functions, it may be suitable for some simple ones. We should investigate the option of using Jython to accelerate these small functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org