You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Chris Freeman <cf...@alteryx.com> on 2015/12/09 19:11:22 UTC

Specifying Scala types when calling methods from SparkR

Hey everyone,

I’m currently looking at ways to save out SparkML model objects from SparkR and I’ve had some luck putting the model into an RDD and then saving the RDD as an Object File. Once it’s saved, I’m able to load it back in with something like:

sc.objectFile[LinearRegressionModel](“path/to/model”)

I’d like to try and replicate this same process from SparkR using the JVM backend APIs (e.g. “callJMethod”), but so far I haven’t been able to replicate my success and I’m guessing that it’s (at least in part) due to the necessity of specifying the type when calling the objectFile method.

Does anyone know if this is actually possible? For example, here’s what I’ve come up with so far:

loadModel <- function(sc, modelPath) {
  modelRDD <- SparkR:::callJMethod(sc,
                                                            "objectFile[PipelineModel]",
                                modelPath,
        SparkR:::getMinPartitions(sc, NULL))
  return(modelRDD)
}

Any help is appreciated!

--
Chris Freeman

RE: Specifying Scala types when calling methods from SparkR

Posted by "Sun, Rui" <ru...@intel.com>.

Hi, Chris,

I know your point: objectFile and saveAsObjectFile pair in SparkR can only be used in SparkR context, as the content of RDD is assumed to be serialized R objects.

It’s fine to drop down to JVM level in the case the model is saved as objectFile in Scala, and load it in SparkR. But I don’t understand “but that seems to only work if you specify the type”, seems no need to specify type because of type erasure?

Did you try something like: convert the RDD to DataFrame, save it , and load it as a DataFrame in SparkR and then to RDD?

From: Chris Freeman [mailto:cfreeman@alteryx.com]
Sent: Friday, December 11, 2015 2:47 AM
To: Sun, Rui; shivaram@eecs.berkeley.edu
Cc: dev@spark.apache.org
Subject: RE: Specifying Scala types when calling methods from SparkR

Hi Sun Rui,

I’ve had some luck simply using “objectFile” when saving from SparkR directly. The problem is that if you do it that way, the model object will only work if you continue to use the current Spark Context, and I think model persistence should really enable you to use the model at a later time. That’s where I found that I could drop down to the JVM level and interact with the Scala object directly, but that seems to only work if you specify the type.

On December 9, 2015 at 7:59:43 PM, Sun, Rui (rui.sun@intel.com<ma...@intel.com>) wrote:
Hi,

Just use ""objectFile" instead of "objectFile[PipelineModel]" for callJMethod. You can take the objectFile() in context.R as example.

Since the SparkContext created in SparkR is actually a JavaSparkContext, there is no need to pass the implicit ClassTag.

-----Original Message-----
From: Shivaram Venkataraman [mailto:shivaram@eecs.berkeley.edu]
Sent: Thursday, December 10, 2015 8:21 AM
To: Chris Freeman
Cc: dev@spark.apache.org<ma...@spark.apache.org>
Subject: Re: Specifying Scala types when calling methods from SparkR

The SparkR callJMethod can only invoke methods as they show up in the Java byte code. So in this case you'll need to check the SparkContext byte code (with javap or something like that) to see how that method looks. My guess is the type is passed in as a class tag argument, so you'll need to do something like create a class tag for the LinearRegressionModel and pass that in as the first or last argument etc.

Thanks
Shivaram

On Wed, Dec 9, 2015 at 10:11 AM, Chris Freeman <cf...@alteryx.com>> wrote:
> Hey everyone,
>
> I’m currently looking at ways to save out SparkML model objects from
> SparkR and I’ve had some luck putting the model into an RDD and then
> saving the RDD as an Object File. Once it’s saved, I’m able to load it
> back in with something like:
>
> sc.objectFile[LinearRegressionModel](“path/to/model”)
>
> I’d like to try and replicate this same process from SparkR using the
> JVM backend APIs (e.g. “callJMethod”), but so far I haven’t been able
> to replicate my success and I’m guessing that it’s (at least in part)
> due to the necessity of specifying the type when calling the objectFile method.
>
> Does anyone know if this is actually possible? For example, here’s
> what I’ve come up with so far:
>
> loadModel <- function(sc, modelPath) {
> modelRDD <- SparkR:::callJMethod(sc,
>
> "objectFile[PipelineModel]",
> modelPath,
> SparkR:::getMinPartitions(sc, NULL))
> return(modelRDD)
> }
>
> Any help is appreciated!
>
> --
> Chris Freeman
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org<ma...@spark.apache.org> For additional commands, e-mail: dev-help@spark.apache.org<ma...@spark.apache.org>

RE: Specifying Scala types when calling methods from SparkR

Posted by Chris Freeman <cf...@alteryx.com>.

Hi Sun Rui,

I’ve had some luck simply using “objectFile” when saving from SparkR directly. The problem is that if you do it that way, the model object will only work if you continue to use the current Spark Context, and I think model persistence should really enable you to use the model at a later time. That’s where I found that I could drop down to the JVM level and interact with the Scala object directly, but that seems to only work if you specify the type.

On December 9, 2015 at 7:59:43 PM, Sun, Rui (rui.sun@intel.com<ma...@intel.com>) wrote:

Hi,

Just use ""objectFile" instead of "objectFile[PipelineModel]" for callJMethod. You can take the objectFile() in context.R as example.

Since the SparkContext created in SparkR is actually a JavaSparkContext, there is no need to pass the implicit ClassTag.

-----Original Message-----
From: Shivaram Venkataraman [mailto:shivaram@eecs.berkeley.edu]
Sent: Thursday, December 10, 2015 8:21 AM
To: Chris Freeman
Cc: dev@spark.apache.org
Subject: Re: Specifying Scala types when calling methods from SparkR

The SparkR callJMethod can only invoke methods as they show up in the Java byte code. So in this case you'll need to check the SparkContext byte code (with javap or something like that) to see how that method looks. My guess is the type is passed in as a class tag argument, so you'll need to do something like create a class tag for the LinearRegressionModel and pass that in as the first or last argument etc.

Thanks
Shivaram

On Wed, Dec 9, 2015 at 10:11 AM, Chris Freeman <cf...@alteryx.com> wrote:
> Hey everyone,
>
> I’m currently looking at ways to save out SparkML model objects from
> SparkR and I’ve had some luck putting the model into an RDD and then
> saving the RDD as an Object File. Once it’s saved, I’m able to load it
> back in with something like:
>
> sc.objectFile[LinearRegressionModel](“path/to/model”)
>
> I’d like to try and replicate this same process from SparkR using the
> JVM backend APIs (e.g. “callJMethod”), but so far I haven’t been able
> to replicate my success and I’m guessing that it’s (at least in part)
> due to the necessity of specifying the type when calling the objectFile method.
>
> Does anyone know if this is actually possible? For example, here’s
> what I’ve come up with so far:
>
> loadModel <- function(sc, modelPath) {
> modelRDD <- SparkR:::callJMethod(sc,
>
> "objectFile[PipelineModel]",
> modelPath,
> SparkR:::getMinPartitions(sc, NULL))
> return(modelRDD)
> }
>
> Any help is appreciated!
>
> --
> Chris Freeman
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org For additional commands, e-mail: dev-help@spark.apache.org

RE: Specifying Scala types when calling methods from SparkR

Posted by "Sun, Rui" <ru...@intel.com>.

Hi,

Just use  ""objectFile" instead of "objectFile[PipelineModel]" for callJMethod. You can take the objectFile() in context.R as example.

Since the SparkContext created in SparkR is actually a JavaSparkContext, there is no need to pass the implicit ClassTag.

-----Original Message-----
From: Shivaram Venkataraman [mailto:shivaram@eecs.berkeley.edu] 
Sent: Thursday, December 10, 2015 8:21 AM
To: Chris Freeman
Cc: dev@spark.apache.org
Subject: Re: Specifying Scala types when calling methods from SparkR

The SparkR callJMethod can only invoke methods as they show up in the Java byte code. So in this case you'll need to check the SparkContext byte code (with javap or something like that) to see how that method looks. My guess is the type is passed in as a class tag argument, so you'll need to do something like create a class tag for the LinearRegressionModel and pass that in as the first or last argument etc.

Thanks
Shivaram

On Wed, Dec 9, 2015 at 10:11 AM, Chris Freeman <cf...@alteryx.com> wrote:
> Hey everyone,
>
> I’m currently looking at ways to save out SparkML model objects from 
> SparkR and I’ve had some luck putting the model into an RDD and then 
> saving the RDD as an Object File. Once it’s saved, I’m able to load it 
> back in with something like:
>
> sc.objectFile[LinearRegressionModel](“path/to/model”)
>
> I’d like to try and replicate this same process from SparkR using the 
> JVM backend APIs (e.g. “callJMethod”), but so far I haven’t been able 
> to replicate my success and I’m guessing that it’s (at least in part) 
> due to the necessity of specifying the type when calling the objectFile method.
>
> Does anyone know if this is actually possible? For example, here’s 
> what I’ve come up with so far:
>
> loadModel <- function(sc, modelPath) {
>   modelRDD <- SparkR:::callJMethod(sc,
>
> "objectFile[PipelineModel]",
>                                 modelPath,
>         SparkR:::getMinPartitions(sc, NULL))
>   return(modelRDD)
> }
>
> Any help is appreciated!
>
> --
> Chris Freeman
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org For additional commands, e-mail: dev-help@spark.apache.org

Re: Specifying Scala types when calling methods from SparkR

Posted by Shivaram Venkataraman <sh...@eecs.berkeley.edu>.

The SparkR callJMethod can only invoke methods as they show up in the
Java byte code. So in this case you'll need to check the SparkContext
byte code (with javap or something like that) to see how that method
looks. My guess is the type is passed in as a class tag argument, so
you'll need to do something like create a class tag for the
LinearRegressionModel and pass that in as the first or last argument
etc.

Thanks
Shivaram

On Wed, Dec 9, 2015 at 10:11 AM, Chris Freeman <cf...@alteryx.com> wrote:
> Hey everyone,
>
> I’m currently looking at ways to save out SparkML model objects from SparkR
> and I’ve had some luck putting the model into an RDD and then saving the RDD
> as an Object File. Once it’s saved, I’m able to load it back in with
> something like:
>
> sc.objectFile[LinearRegressionModel](“path/to/model”)
>
> I’d like to try and replicate this same process from SparkR using the JVM
> backend APIs (e.g. “callJMethod”), but so far I haven’t been able to
> replicate my success and I’m guessing that it’s (at least in part) due to
> the necessity of specifying the type when calling the objectFile method.
>
> Does anyone know if this is actually possible? For example, here’s what I’ve
> come up with so far:
>
> loadModel <- function(sc, modelPath) {
>   modelRDD <- SparkR:::callJMethod(sc,
>
> "objectFile[PipelineModel]",
>                                 modelPath,
>         SparkR:::getMinPartitions(sc, NULL))
>   return(modelRDD)
> }
>
> Any help is appreciated!
>
> --
> Chris Freeman
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org