You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Holden Karau <ho...@pigscanfly.ca> on 2016/05/15 21:40:51 UTC

PySpark mixed with Jython

I've been doing some looking at EclairJS (Spark + Javascript) which takes a
really interesting approach. The driver program is run in node and the
workers are run in nashorn. I was wondering if anyone has given much though
to optionally exposing an interface for PySpark in a similar fashion. For
some UDFs and UDAFs we could keep the data entirely in the JVM, and still
go back to our old PipelinedRDD based interface for operations which
require native libraries or otherwise aren't supported in Jython. Have I
had too much coffee and this is actually a bad idea or is this something
people think would be worth investigating some?

-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau