You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Jeremy Liu <je...@gmail.com> on 2018/03/27 17:04:24 UTC

[Spark R] Proposal: Exposing RBackend in RRunner

Spark Users,

In SparkR, RBackend is created in RRunner.main(). This in particular makes
it difficult to control or use the RBackend. For my use case, I am looking
to access the JVMObjectTracker that RBackend maintains for SparkR
dataframes.

Analogously, pyspark starts a py4j.GatewayServer in PythonRunner.main().
It's then possible to start a ClientServer that then has access to the
object bindings between Python/Java.

Is there something similar for SparkR? Or a reasonable way to expose
RBackend?

Thanks!


-- 
-----
Jeremy Liu
jeremy.jl.liu@gmail.com

Re: [Spark R] Proposal: Exposing RBackend in RRunner

Posted by Felix Cheung <fe...@hotmail.com>.

Auto reference counting should already be handled by SparkR already.

Can you elaborate on which object and how that would be used?

________________________________
From: Jeremy Liu <je...@gmail.com>
Sent: Thursday, March 29, 2018 8:23:58 AM
To: Reynold Xin
Cc: Felix Cheung; dev@spark.apache.org
Subject: Re: [Spark R] Proposal: Exposing RBackend in RRunner

Use case is to cache a reference to the JVM object created by SparkR.

On Wed, Mar 28, 2018 at 12:03 PM Reynold Xin <rx...@databricks.com>> wrote:
If you need the functionality I would recommend you just copying the code over to your project and use it that way.

On Wed, Mar 28, 2018 at 9:02 AM Felix Cheung <fe...@hotmail.com>> wrote:
I think the difference is py4j is a public library whereas the R backend is specific to SparkR.

Can you elaborate what you need JVMObjectTracker for? We have provided R convenient APIs to call into JVM: sparkR.callJMethod for example

_____________________________
From: Jeremy Liu <je...@gmail.com>>
Sent: Tuesday, March 27, 2018 12:20 PM
Subject: Re: [Spark R] Proposal: Exposing RBackend in RRunner
To: <de...@spark.apache.org>>

Spark Dev,

On second thought, the below topic seems more appropriate for spark-dev rather than spark-users:

Spark Users,

In SparkR, RBackend is created in RRunner.main(). This in particular makes it difficult to control or use the RBackend. For my use case, I am looking to access the JVMObjectTracker that RBackend maintains for SparkR dataframes.

Analogously, pyspark starts a py4j.GatewayServer in PythonRunner.main(). It's then possible to start a ClientServer that then has access to the object bindings between Python/Java.

Is there something similar for SparkR? Or a reasonable way to expose RBackend?

Thanks!
--
-----
Jeremy Liu
jeremy.jl.liu@gmail.com<ma...@gmail.com>

--
-----
Jeremy Liu
jeremy.jl.liu@gmail.com<ma...@gmail.com>

Re: [Spark R] Proposal: Exposing RBackend in RRunner

Posted by Jeremy Liu <je...@gmail.com>.

Use case is to cache a reference to the JVM object created by SparkR.

On Wed, Mar 28, 2018 at 12:03 PM Reynold Xin <rx...@databricks.com> wrote:

> If you need the functionality I would recommend you just copying the code
> over to your project and use it that way.
>
> On Wed, Mar 28, 2018 at 9:02 AM Felix Cheung <fe...@hotmail.com>
> wrote:
>
>> I think the difference is py4j is a public library whereas the R backend
>> is specific to SparkR.
>>
>> Can you elaborate what you need JVMObjectTracker for? We have provided R
>> convenient APIs to call into JVM: sparkR.callJMethod for example
>>
>> _____________________________
>> From: Jeremy Liu <je...@gmail.com>
>> Sent: Tuesday, March 27, 2018 12:20 PM
>> Subject: Re: [Spark R] Proposal: Exposing RBackend in RRunner
>> To: <de...@spark.apache.org>
>>
>>
>>
>> Spark Dev,
>>
>> On second thought, the below topic seems more appropriate for spark-dev
>> rather than spark-users:
>>
>> Spark Users,
>>>
>>> In SparkR, RBackend is created in RRunner.main(). This in particular
>>> makes it difficult to control or use the RBackend. For my use case, I am
>>> looking to access the JVMObjectTracker that RBackend maintains for SparkR
>>> dataframes.
>>>
>>> Analogously, pyspark starts a py4j.GatewayServer in PythonRunner.main().
>>> It's then possible to start a ClientServer that then has access to the
>>> object bindings between Python/Java.
>>>
>>> Is there something similar for SparkR? Or a reasonable way to expose
>>> RBackend?
>>>
>>> Thanks!
>>>
>> --
>> -----
>> Jeremy Liu
>> jeremy.jl.liu@gmail.com
>>
>>
>> --
-----
Jeremy Liu
jeremy.jl.liu@gmail.com

Re: [Spark R] Proposal: Exposing RBackend in RRunner

Posted by Reynold Xin <rx...@databricks.com>.

If you need the functionality I would recommend you just copying the code
over to your project and use it that way.

On Wed, Mar 28, 2018 at 9:02 AM Felix Cheung <fe...@hotmail.com>
wrote:

> I think the difference is py4j is a public library whereas the R backend
> is specific to SparkR.
>
> Can you elaborate what you need JVMObjectTracker for? We have provided R
> convenient APIs to call into JVM: sparkR.callJMethod for example
>
> _____________________________
> From: Jeremy Liu <je...@gmail.com>
> Sent: Tuesday, March 27, 2018 12:20 PM
> Subject: Re: [Spark R] Proposal: Exposing RBackend in RRunner
> To: <de...@spark.apache.org>
>
>
>
> Spark Dev,
>
> On second thought, the below topic seems more appropriate for spark-dev
> rather than spark-users:
>
> Spark Users,
>>
>> In SparkR, RBackend is created in RRunner.main(). This in particular
>> makes it difficult to control or use the RBackend. For my use case, I am
>> looking to access the JVMObjectTracker that RBackend maintains for SparkR
>> dataframes.
>>
>> Analogously, pyspark starts a py4j.GatewayServer in PythonRunner.main().
>> It's then possible to start a ClientServer that then has access to the
>> object bindings between Python/Java.
>>
>> Is there something similar for SparkR? Or a reasonable way to expose
>> RBackend?
>>
>> Thanks!
>>
> --
> -----
> Jeremy Liu
> jeremy.jl.liu@gmail.com
>
>
>

Re: [Spark R] Proposal: Exposing RBackend in RRunner

Posted by Felix Cheung <fe...@hotmail.com>.

I think the difference is py4j is a public library whereas the R backend is specific to SparkR.

Can you elaborate what you need JVMObjectTracker for? We have provided R convenient APIs to call into JVM: sparkR.callJMethod for example

_____________________________
From: Jeremy Liu <je...@gmail.com>
Sent: Tuesday, March 27, 2018 12:20 PM
Subject: Re: [Spark R] Proposal: Exposing RBackend in RRunner
To: <de...@spark.apache.org>

Spark Dev,

On second thought, the below topic seems more appropriate for spark-dev rather than spark-users:

Spark Users,

In SparkR, RBackend is created in RRunner.main(). This in particular makes it difficult to control or use the RBackend. For my use case, I am looking to access the JVMObjectTracker that RBackend maintains for SparkR dataframes.

Analogously, pyspark starts a py4j.GatewayServer in PythonRunner.main(). It's then possible to start a ClientServer that then has access to the object bindings between Python/Java.

Is there something similar for SparkR? Or a reasonable way to expose RBackend?

Thanks!
--
-----
Jeremy Liu
jeremy.jl.liu@gmail.com<ma...@gmail.com>

Re: [Spark R] Proposal: Exposing RBackend in RRunner

Posted by Jeremy Liu <je...@gmail.com>.

Spark Dev,

On second thought, the below topic seems more appropriate for spark-dev
rather than spark-users:

Spark Users,
>
> In SparkR, RBackend is created in RRunner.main(). This in particular makes
> it difficult to control or use the RBackend. For my use case, I am looking
> to access the JVMObjectTracker that RBackend maintains for SparkR
> dataframes.
>
> Analogously, pyspark starts a py4j.GatewayServer in PythonRunner.main().
> It's then possible to start a ClientServer that then has access to the
> object bindings between Python/Java.
>
> Is there something similar for SparkR? Or a reasonable way to expose
> RBackend?
>
> Thanks!
>
-- 
-----
Jeremy Liu
jeremy.jl.liu@gmail.com