You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by oppokui <op...@gmail.com> on 2014/09/03 12:19:16 UTC

Support R in Spark

Does spark ML team have plan to support R script natively? There is a SparkR project, but not from spark team. Spark ML used netlib-java to talk with native fortran routines or use NumPy, why not try to use R in some sense. 

R had lot of useful packages. If spark ML team can include R support, it will be a very powerful. 

Any comment?


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Support R in Spark

Posted by oppokui <op...@gmail.com>.

Thanks, Shivaram. 

Kui

> On Sep 19, 2014, at 12:58 AM, Shivaram Venkataraman <sh...@eecs.berkeley.edu> wrote:
> 
> As R is single-threaded, SparkR launches one R process per-executor on
> the worker side.
> 
> Thanks
> Shivaram
> 
> On Thu, Sep 18, 2014 at 7:49 AM, oppokui <op...@gmail.com> wrote:
>> Shivaram,
>> 
>> As I know, SparkR used rJava package. In work node, spark code will execute R code by launching R process and send/receive byte array.
>> I have a question on when to launch R process. R process is per Work process, or per executor thread, or per each RDD processing?
>> 
>> Thanks and Regards.
>> 
>> Kui
>> 
>>> On Sep 6, 2014, at 5:53 PM, oppokui <op...@gmail.com> wrote:
>>> 
>>> Cool! It is a very good news. Can’t wait for it.
>>> 
>>> Kui
>>> 
>>>> On Sep 5, 2014, at 1:58 AM, Shivaram Venkataraman <sh...@eecs.berkeley.edu> wrote:
>>>> 
>>>> Thanks Kui. SparkR is a pretty young project, but there are a bunch of
>>>> things we are working on. One of the main features is to expose a data
>>>> frame API (https://sparkr.atlassian.net/browse/SPARKR-1) and we will
>>>> be integrating this with Spark's MLLib.  At a high-level this will
>>>> allow R users to use a familiar API but make use of MLLib's efficient
>>>> distributed implementation. This is the same strategy used in Python
>>>> as well.
>>>> 
>>>> Also we do hope to merge SparkR with mainline Spark -- we have a few
>>>> features to complete before that and plan to shoot for integration by
>>>> Spark 1.3.
>>>> 
>>>> Thanks
>>>> Shivaram
>>>> 
>>>> On Wed, Sep 3, 2014 at 9:24 PM, oppokui <op...@gmail.com> wrote:
>>>>> Thanks, Shivaram.
>>>>> 
>>>>> No specific use case yet. We try to use R in our project as data scientest
>>>>> are all knowing R. We had a concern that how R handles the mass data. Spark
>>>>> does a better work on big data area, and Spark ML is focusing on predictive
>>>>> analysis area. Then we are thinking whether we can merge R and Spark
>>>>> together. We tried SparkR and it is pretty easy to use. But we didn’t see
>>>>> any feedback on this package in industry. It will be better if Spark team
>>>>> has R support just like scala/Java/Python.
>>>>> 
>>>>> Another question is that MLlib will re-implement all famous data mining
>>>>> algorithms in Spark, then what is the purpose of using R?
>>>>> 
>>>>> There is another technique for us H2O which support R natively. H2O is more
>>>>> friendly to data scientist. I saw H2O can also work on Spark (Sparkling
>>>>> Water).  It is better than using SparkR?
>>>>> 
>>>>> Thanks and Regards.
>>>>> 
>>>>> Kui
>>>>> 
>>>>> 
>>>>> On Sep 4, 2014, at 1:47 AM, Shivaram Venkataraman
>>>>> <sh...@eecs.berkeley.edu> wrote:
>>>>> 
>>>>> Hi
>>>>> 
>>>>> Do you have a specific use-case where SparkR doesn't work well ? We'd love
>>>>> to hear more about use-cases and features that can be improved with SparkR.
>>>>> 
>>>>> Thanks
>>>>> Shivaram
>>>>> 
>>>>> 
>>>>> On Wed, Sep 3, 2014 at 3:19 AM, oppokui <op...@gmail.com> wrote:
>>>>>> 
>>>>>> Does spark ML team have plan to support R script natively? There is a
>>>>>> SparkR project, but not from spark team. Spark ML used netlib-java to talk
>>>>>> with native fortran routines or use NumPy, why not try to use R in some
>>>>>> sense.
>>>>>> 
>>>>>> R had lot of useful packages. If spark ML team can include R support, it
>>>>>> will be a very powerful.
>>>>>> 
>>>>>> Any comment?
>>>>>> 
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>> 
>>>>> 
>>>>> 
>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Support R in Spark

Posted by Shivaram Venkataraman <sh...@eecs.berkeley.edu>.

As R is single-threaded, SparkR launches one R process per-executor on
the worker side.

Thanks
Shivaram

On Thu, Sep 18, 2014 at 7:49 AM, oppokui <op...@gmail.com> wrote:
> Shivaram,
>
> As I know, SparkR used rJava package. In work node, spark code will execute R code by launching R process and send/receive byte array.
> I have a question on when to launch R process. R process is per Work process, or per executor thread, or per each RDD processing?
>
> Thanks and Regards.
>
> Kui
>
>> On Sep 6, 2014, at 5:53 PM, oppokui <op...@gmail.com> wrote:
>>
>> Cool! It is a very good news. Can’t wait for it.
>>
>> Kui
>>
>>> On Sep 5, 2014, at 1:58 AM, Shivaram Venkataraman <sh...@eecs.berkeley.edu> wrote:
>>>
>>> Thanks Kui. SparkR is a pretty young project, but there are a bunch of
>>> things we are working on. One of the main features is to expose a data
>>> frame API (https://sparkr.atlassian.net/browse/SPARKR-1) and we will
>>> be integrating this with Spark's MLLib.  At a high-level this will
>>> allow R users to use a familiar API but make use of MLLib's efficient
>>> distributed implementation. This is the same strategy used in Python
>>> as well.
>>>
>>> Also we do hope to merge SparkR with mainline Spark -- we have a few
>>> features to complete before that and plan to shoot for integration by
>>> Spark 1.3.
>>>
>>> Thanks
>>> Shivaram
>>>
>>> On Wed, Sep 3, 2014 at 9:24 PM, oppokui <op...@gmail.com> wrote:
>>>> Thanks, Shivaram.
>>>>
>>>> No specific use case yet. We try to use R in our project as data scientest
>>>> are all knowing R. We had a concern that how R handles the mass data. Spark
>>>> does a better work on big data area, and Spark ML is focusing on predictive
>>>> analysis area. Then we are thinking whether we can merge R and Spark
>>>> together. We tried SparkR and it is pretty easy to use. But we didn’t see
>>>> any feedback on this package in industry. It will be better if Spark team
>>>> has R support just like scala/Java/Python.
>>>>
>>>> Another question is that MLlib will re-implement all famous data mining
>>>> algorithms in Spark, then what is the purpose of using R?
>>>>
>>>> There is another technique for us H2O which support R natively. H2O is more
>>>> friendly to data scientist. I saw H2O can also work on Spark (Sparkling
>>>> Water).  It is better than using SparkR?
>>>>
>>>> Thanks and Regards.
>>>>
>>>> Kui
>>>>
>>>>
>>>> On Sep 4, 2014, at 1:47 AM, Shivaram Venkataraman
>>>> <sh...@eecs.berkeley.edu> wrote:
>>>>
>>>> Hi
>>>>
>>>> Do you have a specific use-case where SparkR doesn't work well ? We'd love
>>>> to hear more about use-cases and features that can be improved with SparkR.
>>>>
>>>> Thanks
>>>> Shivaram
>>>>
>>>>
>>>> On Wed, Sep 3, 2014 at 3:19 AM, oppokui <op...@gmail.com> wrote:
>>>>>
>>>>> Does spark ML team have plan to support R script natively? There is a
>>>>> SparkR project, but not from spark team. Spark ML used netlib-java to talk
>>>>> with native fortran routines or use NumPy, why not try to use R in some
>>>>> sense.
>>>>>
>>>>> R had lot of useful packages. If spark ML team can include R support, it
>>>>> will be a very powerful.
>>>>>
>>>>> Any comment?
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>
>>>>
>>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Support R in Spark

Posted by oppokui <op...@gmail.com>.

Shivaram, 

As I know, SparkR used rJava package. In work node, spark code will execute R code by launching R process and send/receive byte array. 
I have a question on when to launch R process. R process is per Work process, or per executor thread, or per each RDD processing?

Thanks and Regards.

Kui  

> On Sep 6, 2014, at 5:53 PM, oppokui <op...@gmail.com> wrote:
> 
> Cool! It is a very good news. Can’t wait for it.
> 
> Kui 
> 
>> On Sep 5, 2014, at 1:58 AM, Shivaram Venkataraman <sh...@eecs.berkeley.edu> wrote:
>> 
>> Thanks Kui. SparkR is a pretty young project, but there are a bunch of
>> things we are working on. One of the main features is to expose a data
>> frame API (https://sparkr.atlassian.net/browse/SPARKR-1) and we will
>> be integrating this with Spark's MLLib.  At a high-level this will
>> allow R users to use a familiar API but make use of MLLib's efficient
>> distributed implementation. This is the same strategy used in Python
>> as well.
>> 
>> Also we do hope to merge SparkR with mainline Spark -- we have a few
>> features to complete before that and plan to shoot for integration by
>> Spark 1.3.
>> 
>> Thanks
>> Shivaram
>> 
>> On Wed, Sep 3, 2014 at 9:24 PM, oppokui <op...@gmail.com> wrote:
>>> Thanks, Shivaram.
>>> 
>>> No specific use case yet. We try to use R in our project as data scientest
>>> are all knowing R. We had a concern that how R handles the mass data. Spark
>>> does a better work on big data area, and Spark ML is focusing on predictive
>>> analysis area. Then we are thinking whether we can merge R and Spark
>>> together. We tried SparkR and it is pretty easy to use. But we didn’t see
>>> any feedback on this package in industry. It will be better if Spark team
>>> has R support just like scala/Java/Python.
>>> 
>>> Another question is that MLlib will re-implement all famous data mining
>>> algorithms in Spark, then what is the purpose of using R?
>>> 
>>> There is another technique for us H2O which support R natively. H2O is more
>>> friendly to data scientist. I saw H2O can also work on Spark (Sparkling
>>> Water).  It is better than using SparkR?
>>> 
>>> Thanks and Regards.
>>> 
>>> Kui
>>> 
>>> 
>>> On Sep 4, 2014, at 1:47 AM, Shivaram Venkataraman
>>> <sh...@eecs.berkeley.edu> wrote:
>>> 
>>> Hi
>>> 
>>> Do you have a specific use-case where SparkR doesn't work well ? We'd love
>>> to hear more about use-cases and features that can be improved with SparkR.
>>> 
>>> Thanks
>>> Shivaram
>>> 
>>> 
>>> On Wed, Sep 3, 2014 at 3:19 AM, oppokui <op...@gmail.com> wrote:
>>>> 
>>>> Does spark ML team have plan to support R script natively? There is a
>>>> SparkR project, but not from spark team. Spark ML used netlib-java to talk
>>>> with native fortran routines or use NumPy, why not try to use R in some
>>>> sense.
>>>> 
>>>> R had lot of useful packages. If spark ML team can include R support, it
>>>> will be a very powerful.
>>>> 
>>>> Any comment?
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>> 
>>> 
>>> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Support R in Spark

Posted by oppokui <op...@gmail.com>.

Cool! It is a very good news. Can’t wait for it.

Kui 

> On Sep 5, 2014, at 1:58 AM, Shivaram Venkataraman <sh...@eecs.berkeley.edu> wrote:
> 
> Thanks Kui. SparkR is a pretty young project, but there are a bunch of
> things we are working on. One of the main features is to expose a data
> frame API (https://sparkr.atlassian.net/browse/SPARKR-1) and we will
> be integrating this with Spark's MLLib.  At a high-level this will
> allow R users to use a familiar API but make use of MLLib's efficient
> distributed implementation. This is the same strategy used in Python
> as well.
> 
> Also we do hope to merge SparkR with mainline Spark -- we have a few
> features to complete before that and plan to shoot for integration by
> Spark 1.3.
> 
> Thanks
> Shivaram
> 
> On Wed, Sep 3, 2014 at 9:24 PM, oppokui <op...@gmail.com> wrote:
>> Thanks, Shivaram.
>> 
>> No specific use case yet. We try to use R in our project as data scientest
>> are all knowing R. We had a concern that how R handles the mass data. Spark
>> does a better work on big data area, and Spark ML is focusing on predictive
>> analysis area. Then we are thinking whether we can merge R and Spark
>> together. We tried SparkR and it is pretty easy to use. But we didn’t see
>> any feedback on this package in industry. It will be better if Spark team
>> has R support just like scala/Java/Python.
>> 
>> Another question is that MLlib will re-implement all famous data mining
>> algorithms in Spark, then what is the purpose of using R?
>> 
>> There is another technique for us H2O which support R natively. H2O is more
>> friendly to data scientist. I saw H2O can also work on Spark (Sparkling
>> Water).  It is better than using SparkR?
>> 
>> Thanks and Regards.
>> 
>> Kui
>> 
>> 
>> On Sep 4, 2014, at 1:47 AM, Shivaram Venkataraman
>> <sh...@eecs.berkeley.edu> wrote:
>> 
>> Hi
>> 
>> Do you have a specific use-case where SparkR doesn't work well ? We'd love
>> to hear more about use-cases and features that can be improved with SparkR.
>> 
>> Thanks
>> Shivaram
>> 
>> 
>> On Wed, Sep 3, 2014 at 3:19 AM, oppokui <op...@gmail.com> wrote:
>>> 
>>> Does spark ML team have plan to support R script natively? There is a
>>> SparkR project, but not from spark team. Spark ML used netlib-java to talk
>>> with native fortran routines or use NumPy, why not try to use R in some
>>> sense.
>>> 
>>> R had lot of useful packages. If spark ML team can include R support, it
>>> will be a very powerful.
>>> 
>>> Any comment?
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>> 
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Support R in Spark

Posted by Christopher Nguyen <ct...@adatao.com>.

Hi Kui, sorry about that. That link you mentioned is probably the one for
the products. We don't have one pointing from adatao.com to ddf.io; maybe
we'll add it.

As for access to the code base itself, I think the team has already created
a GitHub repo for it, and should open it up within a few weeks. There's
some debate about whether to put out the implementation with Shark
dependencies now, or SparkSQL with a bit limited functionality and not as
well tested.

I'll check and ping when this is opened up.

The license is Apache.

Sent while mobile. Please excuse typos etc.
On Sep 6, 2014 1:39 PM, "oppokui" <op...@gmail.com> wrote:

> Thanks, Christopher. I saw it before, it is amazing. Last time I try to
> download it from adatao, but no response after filling the table. How can I
> download it or its source code? What is the license?
>
> Kui
>
>
> On Sep 6, 2014, at 8:08 PM, Christopher Nguyen <ct...@adatao.com> wrote:
>
> Hi Kui,
>
> DDF (open sourced) also aims to do something similar, adding RDBMS idioms,
> and is already implemented on top of Spark.
>
> One philosophy is that the DDF API aggressively hides the notion of
> parallel datasets, exposing only (mutable) tables to users, on which they
> can apply R and other familiar data mining/machine learning idioms, without
> having to know about the distributed representation underneath. Now, you
> can get to the underlying RDDs if you want to, simply by asking for it.
>
> This was launched at the July Spark Summit. See
> http://spark-summit.org/2014/talk/distributed-dataframe-ddf-on-apache-spark-simplifying-big-data-for-the-rest-of-us
> .
>
> Sent while mobile. Please excuse typos etc.
> On Sep 4, 2014 1:59 PM, "Shivaram Venkataraman" <
> shivaram@eecs.berkeley.edu> wrote:
>
>> Thanks Kui. SparkR is a pretty young project, but there are a bunch of
>> things we are working on. One of the main features is to expose a data
>> frame API (https://sparkr.atlassian.net/browse/SPARKR-1) and we will
>> be integrating this with Spark's MLLib.  At a high-level this will
>> allow R users to use a familiar API but make use of MLLib's efficient
>> distributed implementation. This is the same strategy used in Python
>> as well.
>>
>> Also we do hope to merge SparkR with mainline Spark -- we have a few
>> features to complete before that and plan to shoot for integration by
>> Spark 1.3.
>>
>> Thanks
>> Shivaram
>>
>> On Wed, Sep 3, 2014 at 9:24 PM, oppokui <op...@gmail.com> wrote:
>> > Thanks, Shivaram.
>> >
>> > No specific use case yet. We try to use R in our project as data
>> scientest
>> > are all knowing R. We had a concern that how R handles the mass data.
>> Spark
>> > does a better work on big data area, and Spark ML is focusing on
>> predictive
>> > analysis area. Then we are thinking whether we can merge R and Spark
>> > together. We tried SparkR and it is pretty easy to use. But we didn’t
>> see
>> > any feedback on this package in industry. It will be better if Spark
>> team
>> > has R support just like scala/Java/Python.
>> >
>> > Another question is that MLlib will re-implement all famous data mining
>> > algorithms in Spark, then what is the purpose of using R?
>> >
>> > There is another technique for us H2O which support R natively. H2O is
>> more
>> > friendly to data scientist. I saw H2O can also work on Spark (Sparkling
>> > Water).  It is better than using SparkR?
>> >
>> > Thanks and Regards.
>> >
>> > Kui
>> >
>> >
>> > On Sep 4, 2014, at 1:47 AM, Shivaram Venkataraman
>> > <sh...@eecs.berkeley.edu> wrote:
>> >
>> > Hi
>> >
>> > Do you have a specific use-case where SparkR doesn't work well ? We'd
>> love
>> > to hear more about use-cases and features that can be improved with
>> SparkR.
>> >
>> > Thanks
>> > Shivaram
>> >
>> >
>> > On Wed, Sep 3, 2014 at 3:19 AM, oppokui <op...@gmail.com> wrote:
>> >>
>> >> Does spark ML team have plan to support R script natively? There is a
>> >> SparkR project, but not from spark team. Spark ML used netlib-java to
>> talk
>> >> with native fortran routines or use NumPy, why not try to use R in some
>> >> sense.
>> >>
>> >> R had lot of useful packages. If spark ML team can include R support,
>> it
>> >> will be a very powerful.
>> >>
>> >> Any comment?
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> >> For additional commands, e-mail: user-help@spark.apache.org
>> >>
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Re: Support R in Spark

Posted by oppokui <op...@gmail.com>.

Thanks, Christopher. I saw it before, it is amazing. Last time I try to download it from adatao, but no response after filling the table. How can I download it or its source code? What is the license?

Kui


> On Sep 6, 2014, at 8:08 PM, Christopher Nguyen <ct...@adatao.com> wrote:
> 
> Hi Kui,
> 
> DDF (open sourced) also aims to do something similar, adding RDBMS idioms, and is already implemented on top of Spark.
> 
> One philosophy is that the DDF API aggressively hides the notion of parallel datasets, exposing only (mutable) tables to users, on which they can apply R and other familiar data mining/machine learning idioms, without having to know about the distributed representation underneath. Now, you can get to the underlying RDDs if you want to, simply by asking for it.
> 
> This was launched at the July Spark Summit. See http://spark-summit.org/2014/talk/distributed-dataframe-ddf-on-apache-spark-simplifying-big-data-for-the-rest-of-us .
> 
> Sent while mobile. Please excuse typos etc.
> 
> On Sep 4, 2014 1:59 PM, "Shivaram Venkataraman" <sh...@eecs.berkeley.edu> wrote:
> Thanks Kui. SparkR is a pretty young project, but there are a bunch of
> things we are working on. One of the main features is to expose a data
> frame API (https://sparkr.atlassian.net/browse/SPARKR-1) and we will
> be integrating this with Spark's MLLib.  At a high-level this will
> allow R users to use a familiar API but make use of MLLib's efficient
> distributed implementation. This is the same strategy used in Python
> as well.
> 
> Also we do hope to merge SparkR with mainline Spark -- we have a few
> features to complete before that and plan to shoot for integration by
> Spark 1.3.
> 
> Thanks
> Shivaram
> 
> On Wed, Sep 3, 2014 at 9:24 PM, oppokui <op...@gmail.com> wrote:
> > Thanks, Shivaram.
> >
> > No specific use case yet. We try to use R in our project as data scientest
> > are all knowing R. We had a concern that how R handles the mass data. Spark
> > does a better work on big data area, and Spark ML is focusing on predictive
> > analysis area. Then we are thinking whether we can merge R and Spark
> > together. We tried SparkR and it is pretty easy to use. But we didn’t see
> > any feedback on this package in industry. It will be better if Spark team
> > has R support just like scala/Java/Python.
> >
> > Another question is that MLlib will re-implement all famous data mining
> > algorithms in Spark, then what is the purpose of using R?
> >
> > There is another technique for us H2O which support R natively. H2O is more
> > friendly to data scientist. I saw H2O can also work on Spark (Sparkling
> > Water).  It is better than using SparkR?
> >
> > Thanks and Regards.
> >
> > Kui
> >
> >
> > On Sep 4, 2014, at 1:47 AM, Shivaram Venkataraman
> > <sh...@eecs.berkeley.edu> wrote:
> >
> > Hi
> >
> > Do you have a specific use-case where SparkR doesn't work well ? We'd love
> > to hear more about use-cases and features that can be improved with SparkR.
> >
> > Thanks
> > Shivaram
> >
> >
> > On Wed, Sep 3, 2014 at 3:19 AM, oppokui <op...@gmail.com> wrote:
> >>
> >> Does spark ML team have plan to support R script natively? There is a
> >> SparkR project, but not from spark team. Spark ML used netlib-java to talk
> >> with native fortran routines or use NumPy, why not try to use R in some
> >> sense.
> >>
> >> R had lot of useful packages. If spark ML team can include R support, it
> >> will be a very powerful.
> >>
> >> Any comment?
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> >> For additional commands, e-mail: user-help@spark.apache.org
> >>
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

Re: Support R in Spark

Posted by Christopher Nguyen <ct...@adatao.com>.

Hi Kui,

DDF (open sourced) also aims to do something similar, adding RDBMS idioms,
and is already implemented on top of Spark.

One philosophy is that the DDF API aggressively hides the notion of
parallel datasets, exposing only (mutable) tables to users, on which they
can apply R and other familiar data mining/machine learning idioms, without
having to know about the distributed representation underneath. Now, you
can get to the underlying RDDs if you want to, simply by asking for it.

This was launched at the July Spark Summit. See
http://spark-summit.org/2014/talk/distributed-dataframe-ddf-on-apache-spark-simplifying-big-data-for-the-rest-of-us
.

Sent while mobile. Please excuse typos etc.
On Sep 4, 2014 1:59 PM, "Shivaram Venkataraman" <sh...@eecs.berkeley.edu>
wrote:

> Thanks Kui. SparkR is a pretty young project, but there are a bunch of
> things we are working on. One of the main features is to expose a data
> frame API (https://sparkr.atlassian.net/browse/SPARKR-1) and we will
> be integrating this with Spark's MLLib.  At a high-level this will
> allow R users to use a familiar API but make use of MLLib's efficient
> distributed implementation. This is the same strategy used in Python
> as well.
>
> Also we do hope to merge SparkR with mainline Spark -- we have a few
> features to complete before that and plan to shoot for integration by
> Spark 1.3.
>
> Thanks
> Shivaram
>
> On Wed, Sep 3, 2014 at 9:24 PM, oppokui <op...@gmail.com> wrote:
> > Thanks, Shivaram.
> >
> > No specific use case yet. We try to use R in our project as data
> scientest
> > are all knowing R. We had a concern that how R handles the mass data.
> Spark
> > does a better work on big data area, and Spark ML is focusing on
> predictive
> > analysis area. Then we are thinking whether we can merge R and Spark
> > together. We tried SparkR and it is pretty easy to use. But we didn’t see
> > any feedback on this package in industry. It will be better if Spark team
> > has R support just like scala/Java/Python.
> >
> > Another question is that MLlib will re-implement all famous data mining
> > algorithms in Spark, then what is the purpose of using R?
> >
> > There is another technique for us H2O which support R natively. H2O is
> more
> > friendly to data scientist. I saw H2O can also work on Spark (Sparkling
> > Water).  It is better than using SparkR?
> >
> > Thanks and Regards.
> >
> > Kui
> >
> >
> > On Sep 4, 2014, at 1:47 AM, Shivaram Venkataraman
> > <sh...@eecs.berkeley.edu> wrote:
> >
> > Hi
> >
> > Do you have a specific use-case where SparkR doesn't work well ? We'd
> love
> > to hear more about use-cases and features that can be improved with
> SparkR.
> >
> > Thanks
> > Shivaram
> >
> >
> > On Wed, Sep 3, 2014 at 3:19 AM, oppokui <op...@gmail.com> wrote:
> >>
> >> Does spark ML team have plan to support R script natively? There is a
> >> SparkR project, but not from spark team. Spark ML used netlib-java to
> talk
> >> with native fortran routines or use NumPy, why not try to use R in some
> >> sense.
> >>
> >> R had lot of useful packages. If spark ML team can include R support, it
> >> will be a very powerful.
> >>
> >> Any comment?
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> >> For additional commands, e-mail: user-help@spark.apache.org
> >>
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: Support R in Spark

Posted by Shivaram Venkataraman <sh...@eecs.berkeley.edu>.

Thanks Kui. SparkR is a pretty young project, but there are a bunch of
things we are working on. One of the main features is to expose a data
frame API (https://sparkr.atlassian.net/browse/SPARKR-1) and we will
be integrating this with Spark's MLLib.  At a high-level this will
allow R users to use a familiar API but make use of MLLib's efficient
distributed implementation. This is the same strategy used in Python
as well.

Also we do hope to merge SparkR with mainline Spark -- we have a few
features to complete before that and plan to shoot for integration by
Spark 1.3.

Thanks
Shivaram

On Wed, Sep 3, 2014 at 9:24 PM, oppokui <op...@gmail.com> wrote:
> Thanks, Shivaram.
>
> No specific use case yet. We try to use R in our project as data scientest
> are all knowing R. We had a concern that how R handles the mass data. Spark
> does a better work on big data area, and Spark ML is focusing on predictive
> analysis area. Then we are thinking whether we can merge R and Spark
> together. We tried SparkR and it is pretty easy to use. But we didn’t see
> any feedback on this package in industry. It will be better if Spark team
> has R support just like scala/Java/Python.
>
> Another question is that MLlib will re-implement all famous data mining
> algorithms in Spark, then what is the purpose of using R?
>
> There is another technique for us H2O which support R natively. H2O is more
> friendly to data scientist. I saw H2O can also work on Spark (Sparkling
> Water).  It is better than using SparkR?
>
> Thanks and Regards.
>
> Kui
>
>
> On Sep 4, 2014, at 1:47 AM, Shivaram Venkataraman
> <sh...@eecs.berkeley.edu> wrote:
>
> Hi
>
> Do you have a specific use-case where SparkR doesn't work well ? We'd love
> to hear more about use-cases and features that can be improved with SparkR.
>
> Thanks
> Shivaram
>
>
> On Wed, Sep 3, 2014 at 3:19 AM, oppokui <op...@gmail.com> wrote:
>>
>> Does spark ML team have plan to support R script natively? There is a
>> SparkR project, but not from spark team. Spark ML used netlib-java to talk
>> with native fortran routines or use NumPy, why not try to use R in some
>> sense.
>>
>> R had lot of useful packages. If spark ML team can include R support, it
>> will be a very powerful.
>>
>> Any comment?
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Support R in Spark

Posted by oppokui <op...@gmail.com>.

Thanks, Shivaram. 

No specific use case yet. We try to use R in our project as data scientest are all knowing R. We had a concern that how R handles the mass data. Spark does a better work on big data area, and Spark ML is focusing on predictive analysis area. Then we are thinking whether we can merge R and Spark together. We tried SparkR and it is pretty easy to use. But we didn’t see any feedback on this package in industry. It will be better if Spark team has R support just like scala/Java/Python. 

Another question is that MLlib will re-implement all famous data mining algorithms in Spark, then what is the purpose of using R?

There is another technique for us H2O which support R natively. H2O is more friendly to data scientist. I saw H2O can also work on Spark (Sparkling Water).  It is better than using SparkR?

Thanks and Regards.

Kui

> On Sep 4, 2014, at 1:47 AM, Shivaram Venkataraman <sh...@eecs.berkeley.edu> wrote:
> 
> Hi 
> 
> Do you have a specific use-case where SparkR doesn't work well ? We'd love to hear more about use-cases and features that can be improved with SparkR.
> 
> Thanks
> Shivaram
> 
> 
> On Wed, Sep 3, 2014 at 3:19 AM, oppokui <op...@gmail.com> wrote:
> Does spark ML team have plan to support R script natively? There is a SparkR project, but not from spark team. Spark ML used netlib-java to talk with native fortran routines or use NumPy, why not try to use R in some sense.
> 
> R had lot of useful packages. If spark ML team can include R support, it will be a very powerful.
> 
> Any comment?
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
> 
>