You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Fiske, Danny" <Da...@ext.ons.gov.uk> on 2019/07/15 13:58:32 UTC
[PySpark] [SparkR] Is it possible to invoke a PySpark function with a
SparkR DataFrame?
Hi all,
Forgive this naïveté, I'm looking for reassurance from some experts!
In the past we created a tailored Spark library for our organisation, implementing Spark functions in Scala with Python and R "wrappers" on top, but the focus on Scala has alienated our analysts/statisticians/data scientists and collaboration is important for us (yeah... we're aware that your SDKs are very similar across languages... :/ ). We'd like to see if we could forego the Scala facet in order to present the source code in a language more familiar to users and internal contributors.
We'd ideally write our functions with PySpark and potentially create a SparkR "wrapper" over the top, leading to the question:
Given a function written with PySpark that accepts a DataFrame parameter, is there a way to invoke this function using a SparkR DataFrame?
Is there any reason to pursue this? Is it even possible?
Many thanks,
Danny
For the latest data on the economy and society, consult our website at http://www.ons.gov.uk
***********************************************************************************************
Please Note: Incoming and outgoing email messages are routinely monitored for compliance with our policy
on the use of electronic communications
***********************************************************************************************
Legal Disclaimer: Any views expressed by the sender of this message are not necessarily those of the
Office for National Statistics
***********************************************************************************************
Re: [PySpark] [SparkR] Is it possible to invoke a PySpark function
with a SparkR DataFrame?
Posted by Felix Cheung <fe...@hotmail.com>.
Not currently in Spark.
However, there are systems out there that can share DataFrame between languages on top of Spark - it’s not calling the python UDF directly but you can pass the DataFrame to python and then .map(UDF) that way.
________________________________
From: Fiske, Danny <Da...@ext.ons.gov.uk>
Sent: Monday, July 15, 2019 6:58:32 AM
To: user@spark.apache.org
Subject: [PySpark] [SparkR] Is it possible to invoke a PySpark function with a SparkR DataFrame?
Hi all,
Forgive this naïveté, I’m looking for reassurance from some experts!
In the past we created a tailored Spark library for our organisation, implementing Spark functions in Scala with Python and R “wrappers” on top, but the focus on Scala has alienated our analysts/statisticians/data scientists and collaboration is important for us (yeah… we’re aware that your SDKs are very similar across languages… :/ ). We’d like to see if we could forego the Scala facet in order to present the source code in a language more familiar to users and internal contributors.
We’d ideally write our functions with PySpark and potentially create a SparkR “wrapper” over the top, leading to the question:
Given a function written with PySpark that accepts a DataFrame parameter, is there a way to invoke this function using a SparkR DataFrame?
Is there any reason to pursue this? Is it even possible?
Many thanks,
Danny
For the latest data on the economy and society, consult our website at http://www.ons.gov.uk<http://www.ons.gov.uk/>
***********************************************************************************************
Please Note: Incoming and outgoing email messages are routinely monitored for compliance with our policy on the use of electronic communications
***********************************************************************************************
Legal Disclaimer: Any views expressed by the sender of this message are not necessarily those of the Office for National Statistics
***********************************************************************************************