You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Krakna H <sh...@gmail.com> on 2014/03/16 13:59:40 UTC

Contributing pyspark ports

Is there any documentation on contributing pyspark ports of additions to
Spark? I only see guidelines on Scala contributions (
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark).
Specifically, I'm interested in porting mllib and graphx contributions.




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Contributing-pyspark-ports-tp2714.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Contributing pyspark ports

Posted by Matei Zaharia <ma...@gmail.com>.

Unfortunately there isn’t a guide, but you can read a PySpark internals overview at https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals. This would be the thing to follow.

In terms of MLlib and GraphX, I think MLlib will be easier to expose at first — it’s designed to be easy to call from Java, and we’ve already created bindings for many of the algorithms that connect with NumPy. (A couple of new algorithms have been added since then though.) GraphX currently isn’t easy to call from Java and will be even harder to deal with in Python. I’d start with a Java API for it first.

BTW in both of these we want to call the JVM codebase from Python. That will be a lot more efficient than implementing the same code in Python, and more maintainable as well.

Matei

On Mar 16, 2014, at 5:59 AM, Krakna H <sh...@gmail.com> wrote:

> Is there any documentation on contributing pyspark ports of additions to Spark? I only see guidelines on Scala contributions (https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark). Specifically, I'm interested in porting mllib and graphx contributions.
> 
> View this message in context: Contributing pyspark ports
> Sent from the Apache Spark User List mailing list archive at Nabble.com.