You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@predictionio.apache.org by Fangzhou Yang <fa...@hotmail.com> on 2017/04/19 07:06:17 UTC

Dose v.011 support Spark ML, DataFrame and Pipeline

Hi all,



I'm new to predictionio. I just noticed that v0.11 can already support Spark 2.x, can it also currently support Spark ML, DataFrame and Pipeline. It seems the Algorithm interfaces support only Spark RDD. If SparkML is not supported for now, will it be on the roadmap? Are there anyone already work on it?


Many Thanks,

Fangzhou

Re: Dose v.011 support Spark ML, DataFrame and Pipeline

Posted by Fangzhou Yang <fa...@hotmail.com>.
Thank you. Now I see. The data input for the algorithm interface is RDD, but we can convert it into DataFrame manually for Spark ML pipeline algorithm.

I have also found a template, that using DataFrame Spark ML for training and predicting. (https://github.com/goliasz/pio-template-sr)

________________________________
From: Pat Ferrel <pa...@occamsmachete.com>
Sent: Friday, April 21, 2017 1:55:42 AM
To: user@predictionio.incubator.apache.org
Subject: Re: Dose v.011 support Spark ML, DataFrame and Pipeline

No, the API for Access to data in the EventServer does not require dataframes and so does not use them but you can easily convert into one if you need it. As to SparkML, use whatever you need in your algorithm. There are no restrictions as long as you build PIO for Spark 2 and include whatever libs you need in your Template’s build.sbt.

I maintain The Universal Recommender, which uses Mahout on Spark, not MLlib. It also does not use Spark for deployed query serving, which is typical of many Templates. So there is room to use your own architecture as long as it fits the general patterns.

https://github.com/actionml/universal-recommender


On Apr 19, 2017, at 11:28 PM, Fangzhou Yang <fa...@hotmail.com>> wrote:

Thanks for the reply.

As I understand, the template algorithm uses PAlgorithm interface from PIO, which are using RDD instead of DataFrame. Can I also implement a template algorithm with SparkML and DataFrame? Is there any guide online?

@Pat Ferrel<ma...@occamsmachete.com> Is the template that you maintaining on the github? If yes, could you provide the link?

Many Thanks,
Fangzhou
________________________________
From: Pat Ferrel <pa...@occamsmachete.com>>
Sent: Wednesday, April 19, 2017 10:37:08 PM
To: user@predictionio.incubator.apache.org<ma...@predictionio.incubator.apache.org>
Subject: Re: Dose v.011 support Spark ML, DataFrame and Pipeline

There is no restriction in templates for what they use of Spark. The ones you are looking at simply don’t need those interfaces. If you need them and are writing templates you can use them. In fact I maintain a template that does not use Spark for the Algorithm, only for IO.

If you think some new API should be in the default PIO API which would that be?


On Apr 19, 2017, at 12:06 AM, Fangzhou Yang <fa...@hotmail.com>> wrote:

Hi all,


I'm new to predictionio. I just noticed that v0.11 can already support Spark 2.x, can it also currently support Spark ML, DataFrame and Pipeline. It seems the Algorithm interfaces support only Spark RDD. If SparkML is not supported for now, will it be on the roadmap? Are there anyone already work on it?

Many Thanks,
Fangzhou


Re: Dose v.011 support Spark ML, DataFrame and Pipeline

Posted by Pat Ferrel <pa...@occamsmachete.com>.
No, the API for Access to data in the EventServer does not require dataframes and so does not use them but you can easily convert into one if you need it. As to SparkML, use whatever you need in your algorithm. There are no restrictions as long as you build PIO for Spark 2 and include whatever libs you need in your Template’s build.sbt. 

I maintain The Universal Recommender, which uses Mahout on Spark, not MLlib. It also does not use Spark for deployed query serving, which is typical of many Templates. So there is room to use your own architecture as long as it fits the general patterns.

https://github.com/actionml/universal-recommender <https://github.com/actionml/universal-recommender>


On Apr 19, 2017, at 11:28 PM, Fangzhou Yang <fa...@hotmail.com> wrote:

Thanks for the reply. 

As I understand, the template algorithm uses PAlgorithm interface from PIO, which are using RDD instead of DataFrame. Can I also implement a template algorithm with SparkML and DataFrame? Is there any guide online? 

@Pat Ferrel <ma...@occamsmachete.com> Is the template that you maintaining on the github? If yes, could you provide the link?

Many Thanks,
Fangzhou 
From: Pat Ferrel <pa...@occamsmachete.com>
Sent: Wednesday, April 19, 2017 10:37:08 PM
To: user@predictionio.incubator.apache.org
Subject: Re: Dose v.011 support Spark ML, DataFrame and Pipeline
 
There is no restriction in templates for what they use of Spark. The ones you are looking at simply don’t need those interfaces. If you need them and are writing templates you can use them. In fact I maintain a template that does not use Spark for the Algorithm, only for IO.

If you think some new API should be in the default PIO API which would that be?


On Apr 19, 2017, at 12:06 AM, Fangzhou Yang <fangzhou.yang@hotmail.com <ma...@hotmail.com>> wrote:

Hi all,


I'm new to predictionio. I just noticed that v0.11 can already support Spark 2.x, can it also currently support Spark ML, DataFrame and Pipeline. It seems the Algorithm interfaces support only Spark RDD. If SparkML is not supported for now, will it be on the roadmap? Are there anyone already work on it?  

Many Thanks,
Fangzhou


Re: Dose v.011 support Spark ML, DataFrame and Pipeline

Posted by Fangzhou Yang <fa...@hotmail.com>.
Thanks for the reply.


As I understand, the template algorithm uses PAlgorithm interface from PIO, which are using RDD instead of DataFrame. Can I also implement a template algorithm with SparkML and DataFrame? Is there any guide online?


@Pat Ferrel<ma...@occamsmachete.com> Is the template that you maintaining on the github? If yes, could you provide the link?


Many Thanks,

Fangzhou

________________________________
From: Pat Ferrel <pa...@occamsmachete.com>
Sent: Wednesday, April 19, 2017 10:37:08 PM
To: user@predictionio.incubator.apache.org
Subject: Re: Dose v.011 support Spark ML, DataFrame and Pipeline

There is no restriction in templates for what they use of Spark. The ones you are looking at simply don’t need those interfaces. If you need them and are writing templates you can use them. In fact I maintain a template that does not use Spark for the Algorithm, only for IO.

If you think some new API should be in the default PIO API which would that be?


On Apr 19, 2017, at 12:06 AM, Fangzhou Yang <fa...@hotmail.com>> wrote:

Hi all,


I'm new to predictionio. I just noticed that v0.11 can already support Spark 2.x, can it also currently support Spark ML, DataFrame and Pipeline. It seems the Algorithm interfaces support only Spark RDD. If SparkML is not supported for now, will it be on the roadmap? Are there anyone already work on it?

Many Thanks,
Fangzhou


Re: Dose v.011 support Spark ML, DataFrame and Pipeline

Posted by Pat Ferrel <pa...@occamsmachete.com>.
There is no restriction in templates for what they use of Spark. The ones you are looking at simply don’t need those interfaces. If you need them and are writing templates you can use them. In fact I maintain a template that does not use Spark for the Algorithm, only for IO.

If you think some new API should be in the default PIO API which would that be?


On Apr 19, 2017, at 12:06 AM, Fangzhou Yang <fa...@hotmail.com> wrote:

Hi all,


I'm new to predictionio. I just noticed that v0.11 can already support Spark 2.x, can it also currently support Spark ML, DataFrame and Pipeline. It seems the Algorithm interfaces support only Spark RDD. If SparkML is not supported for now, will it be on the roadmap? Are there anyone already work on it?  

Many Thanks,
Fangzhou