You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Dominik Hübner <co...@dhuebner.com> on 2014/09/03 11:42:49 UTC

Exchanging data between pyspark and scala

Hey,
I am about to implement a spark app which will require to use both, pyspark and spark on scala.

Data should be read from AWS S3 (compressed CSV files), and must be pre-processed by an existing Python codebase. However, our final goal is to make those datasets available for Spark apps written in either Python or Scala through e.g. Tachyon. 

S3 => Pyspark => Tachyon => {Py, Scala}Spark

Is there any recommended way to pass data between Spark applications implemented in different languages? I thought about using some sort of serialisation framework like Thrift or Avro, but maybe there are other ways to do this (if possible without writing CSV files). I am open for any kind of input!
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org