You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by harirajaram <ha...@gmail.com> on 2015/07/13 14:30:46 UTC

Share RDD from SparkR and another application

Hello,
I would like to share RDD between an application and sparkR.
I understand we have job-server and IBM kernel for sharing the context for
different applications but not sure how we can use it with sparkR as it is
some sort of front end (R shell) with spark.
Any insights appreciated.

Hari





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Share-RDD-from-SparkR-and-another-application-tp23795.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


RE: Share RDD from SparkR and another application

Posted by "Sun, Rui" <ru...@intel.com>.
Hi, hari,

I don't think job-server can work with SparkR (also pySpark). It seems it would be technically possible but needs support from job-server and SparkR(also pySpark), which doesn't exist yet.

But there may be some in-direct ways of sharing RDDs between SparkR and an application. For example, you may save the RDD in the application as a text file which can be later loaded by SparkR to create an SparkR RDD, or vice vesa. (unfortunately, SparkR now only supports creating RDDs from text files instead of any arbitrary Hadoop files.)  It would be easier if your application can use DataFrame. SparkR can create and manipulates DataFrames from virtually any external data sources, which can be used as intermediate media for data exchange.
________________________________________
From: harirajaram [hari.rajaram@gmail.com]
Sent: Monday, July 13, 2015 8:30 PM
To: user@spark.apache.org
Subject: Share RDD from SparkR and another application

Hello,
I would like to share RDD between an application and sparkR.
I understand we have job-server and IBM kernel for sharing the context for
different applications but not sure how we can use it with sparkR as it is
some sort of front end (R shell) with spark.
Any insights appreciated.

Hari





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Share-RDD-from-SparkR-and-another-application-tp23795.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Share RDD from SparkR and another application

Posted by harirajaram <ha...@gmail.com>.
A small correction when I typed it is not RDDBackend it is RBackend,sorry.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Share-RDD-from-SparkR-and-another-application-tp23795p23828.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Share RDD from SparkR and another application

Posted by harirajaram <ha...@gmail.com>.
I appreciate your reply.
Yes,you are right by putting in a parquet etc and reading from another app,I
would rather use spark-jobserver or IBM kernel to achieve the same if it is
not SparkR as it gives more flexibility/scalabilty.
Anyway,I have found a way to run R for my poc from my existing app using the
PipedRDD,I feel that there are things to improve there but it is giving me a
good start,internally the RDDBackEnd service is running Netty so will poke
around these concepts further to understand it better.

Thanks
Hari



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Share-RDD-from-SparkR-and-another-application-tp23795p23827.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org