You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Usman Masood <us...@locu.com> on 2013/08/06 02:59:07 UTC

Writing your own RDD

Hey,

Is it possible to write a custom RDD in Python using PySpark? We have a
HTTP api for reading time series data which supports range scans (so it
should be easy to partition data) and we're considering using Spark to
analyze that data. If we can't write a RDD in Python, is it possible to
write one in Scala and then make use of it in Python land?

Usman

Re: Writing your own RDD

Posted by Matei Zaharia <ma...@gmail.com>.
Hi Usman,

I believe the easiest way would be to create an RDD of Strings in Java or Scala. It's pretty easy to wrap that into a PySpark RDD object. For example, take a look at pyspark.SparkContext.textFile, in context.py. It just creates a Java RDD of Strings and then wraps it.

Matei

On Aug 5, 2013, at 5:59 PM, Usman Masood <us...@locu.com> wrote:

> Hey,
> 
> Is it possible to write a custom RDD in Python using PySpark? We have a HTTP api for reading time series data which supports range scans (so it should be easy to partition data) and we're considering using Spark to analyze that data. If we can't write a RDD in Python, is it possible to write one in Scala and then make use of it in Python land?
> 
> Usman