You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Nick Pentreath <ni...@gmail.com> on 2013/10/25 08:41:21 UTC

Julia bindings

Hi Spark Devs

If you could pick one language binding to add to Spark what would it be?
Probably Clojure or JRuby if JVM is of interest.

I'm quite excited about Julia as a language for scientific computing (
http://julialang.org). The Julia community have been very focused on things
like interop with R, Matlab, and probably mostly Python (see
https://github.com/stevengj/PyCall.jl and
https://github.com/stevengj/PyPlot.jl for example).

Anyway, this is a bit of a thought experiment but I'd imagine a Julia API
would be similar in principle to the Python API. On the Spark Java side, it
would likely be almost the same. On the Julia side I'd imagine the major
sticking point would be serialisation (eg PyCloud equivalent code).

I actually played around with PyCall and was able to call PySpark from the
Julia console. You're able to run arbitrary Python PySpark code (though the
syntax is a bit ugly) and it seemed to mostly work.

However, when I tried to pass in a Julia function or closure, it failed at
the serialization step.

So one option would be to figure out how to serialize the required things
on the Julia side and to use PyCall for interop. This could add a fair bit
of overhead Julia <-> Python <-> Java so perhaps not worth it, but still
the idea of being able to use Spark for the distributed computing part and
to be able to mix n match Python code/libraries and Julia code/libraries
for things like stats/machine learning is very appealing!

Thoughts?

Nick

Re: Julia bindings

Posted by Matei Zaharia <ma...@gmail.com>.
Hi Nick,

This would definitely be interesting to explore, especially if the Julia folks are open to supporting other parallel compute engines. In terms of technical work, the toughest part will likely be capturing Julia functions and shipping them across the network, as you said. It all depends on how easy that is within that language. Beyond that, you may want to ask them for JVM bindings. There is lots of software that uses the JVM so it might not be a bad idea to add it. I would avoid going through Python if possible unless you specifically think mixing those libraries is important (but even then it might be possible to do that in a different way, e.g. call Python from Julia).

Matei

On Oct 24, 2013, at 11:41 PM, Nick Pentreath <ni...@gmail.com> wrote:

> Hi Spark Devs
> 
> If you could pick one language binding to add to Spark what would it be?
> Probably Clojure or JRuby if JVM is of interest.
> 
> I'm quite excited about Julia as a language for scientific computing (
> http://julialang.org). The Julia community have been very focused on things
> like interop with R, Matlab, and probably mostly Python (see
> https://github.com/stevengj/PyCall.jl and
> https://github.com/stevengj/PyPlot.jl for example).
> 
> Anyway, this is a bit of a thought experiment but I'd imagine a Julia API
> would be similar in principle to the Python API. On the Spark Java side, it
> would likely be almost the same. On the Julia side I'd imagine the major
> sticking point would be serialisation (eg PyCloud equivalent code).
> 
> I actually played around with PyCall and was able to call PySpark from the
> Julia console. You're able to run arbitrary Python PySpark code (though the
> syntax is a bit ugly) and it seemed to mostly work.
> 
> However, when I tried to pass in a Julia function or closure, it failed at
> the serialization step.
> 
> So one option would be to figure out how to serialize the required things
> on the Julia side and to use PyCall for interop. This could add a fair bit
> of overhead Julia <-> Python <-> Java so perhaps not worth it, but still
> the idea of being able to use Spark for the distributed computing part and
> to be able to mix n match Python code/libraries and Julia code/libraries
> for things like stats/machine learning is very appealing!
> 
> Thoughts?
> 
> Nick