You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Daniel Li <da...@gmail.com> on 2015/08/06 10:27:47 UTC

Why SparkR didn't reuse PythonRDD

On behalf of Renyi Xiong -

When reading Spark codebase, looks to me PythonRDD.scala is reusable, I
wonder why SparkR choose to implement its own RRDD.scala?

thanks
Daniel

Re: Why SparkR didn't reuse PythonRDD

Posted by Shivaram Venkataraman <sh...@eecs.berkeley.edu>.
PythonRDD.scala has a number of PySpark specific conventions (for
example worker reuse, exceptions etc.) and PySpark specific protocols
(e.g. for communicating accumulators, broadcasts between the JVM and
Python etc.). While it might be possible to refactor the two classes
to share some more code I don't think its worth making the code more
complex in order to do that.

Thanks
Shivaram

On Thu, Aug 6, 2015 at 1:27 AM, Daniel Li <da...@gmail.com> wrote:
> On behalf of Renyi Xiong -
>
> When reading Spark codebase, looks to me PythonRDD.scala is reusable, I
> wonder why SparkR choose to implement its own RRDD.scala?
>
> thanks
> Daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org