You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Hemant Bhanawat <he...@gmail.com> on 2018/08/30 06:45:28 UTC

mllib + SQL

Is there a plan to support SQL extensions for mllib? Or is there an effort
already underway?

Any information is appreciated.

Thanks in advance.
Hemant

Re: mllib + SQL

Posted by Hemant Bhanawat <he...@gmail.com>.
BTW, I can contribute if there is already an effort going on somewhere.

On Fri, Aug 31, 2018 at 3:35 PM Hemant Bhanawat <he...@gmail.com>
wrote:

> We allow our users to interact with spark cluster using SQL queries only.
> That's easy for them. MLLib does not have SQL extensions and we cannot
> expose it to our users.
>
> SQL extensions can further accelerate MLLib's adoption. See
> https://cloud.google.com/bigquery/docs/bigqueryml-intro.
>
> Hemant
>
>
> On Thu, Aug 30, 2018 at 9:41 PM William Benton <wi...@redhat.com> wrote:
>
>> What are you interested in accomplishing?
>>
>> The spark.ml package has provided a machine learning API based on
>> DataFrames for quite some time.  If you are interested in mixing query
>> processing and machine learning, this is certainly the best place to start.
>>
>> See here:  https://spark.apache.org/docs/latest/ml-guide.html
>>
>>
>> best,
>> wb
>>
>>
>>
>> On Thu, Aug 30, 2018 at 1:45 AM Hemant Bhanawat <he...@gmail.com>
>> wrote:
>>
>>> Is there a plan to support SQL extensions for mllib? Or is there an
>>> effort already underway?
>>>
>>> Any information is appreciated.
>>>
>>> Thanks in advance.
>>> Hemant
>>>
>>

Re: mllib + SQL

Posted by Hemant Bhanawat <he...@gmail.com>.
SQL in addition to simplicity also provides standard way of analysis across
multiple databases. That aspect is something that users would like with
machine learning as well.

Flexibility of Spark's API is definitely helpful but a simple and standard
way for new users is desired when it comes to machine learning.

IMO, SQL on ML should come as an incremental addition to Spark's
capabilities.


On Fri, Aug 31, 2018, 7:14 PM Sean Owen <sr...@gmail.com> wrote:

> My $0.02 -- this isn't worthwhile.
>
> Yes, there are ML-in-SQL tools. I'm thinking of MADlib for example. I
> think these hold over from days when someone's only interface to a data
> warehouse was SQL, and so there had to be SQL-language support for invoking
> ML jobs. There was no programmatic alternative.
>
> There's nothing particularly helpful about SQL as a language for
> expressing this, versus simply writing operations in a high-level
> programming language.
>
> Spark is that programmatic paradigm, and offers a more general way to
> express ETL, ML and SQL within their own appropriate DSLs. There's no need
> to also shoehorn Spark ML into Spark SQL.
>
> I also think there's a bit of false abstraction here. The nice thing about
> SQL-only access to these functions is it sounds much simpler, and
> accessible to people that only know SQL and nothing about Python or JVMs.
> In practice, using Spark means having some basic awareness of its
> distributed execution environment. SQL-only analysts would struggle to be
> effective with SQL-only access to Spark.
>
> On Fri, Aug 31, 2018 at 5:05 AM Hemant Bhanawat <he...@gmail.com>
> wrote:
>
>> We allow our users to interact with spark cluster using SQL queries only.
>> That's easy for them. MLLib does not have SQL extensions and we cannot
>> expose it to our users.
>>
>> SQL extensions can further accelerate MLLib's adoption. See
>> https://cloud.google.com/bigquery/docs/bigqueryml-intro.
>>
>> Hemant
>>
>>
>> On Thu, Aug 30, 2018 at 9:41 PM William Benton <wi...@redhat.com> wrote:
>>
>>> What are you interested in accomplishing?
>>>
>>> The spark.ml package has provided a machine learning API based on
>>> DataFrames for quite some time.  If you are interested in mixing query
>>> processing and machine learning, this is certainly the best place to start.
>>>
>>> See here:  https://spark.apache.org/docs/latest/ml-guide.html
>>>
>>>
>>> best,
>>> wb
>>>
>>>
>>>
>>> On Thu, Aug 30, 2018 at 1:45 AM Hemant Bhanawat <he...@gmail.com>
>>> wrote:
>>>
>>>> Is there a plan to support SQL extensions for mllib? Or is there an
>>>> effort already underway?
>>>>
>>>> Any information is appreciated.
>>>>
>>>> Thanks in advance.
>>>> Hemant
>>>>
>>>

Re: mllib + SQL

Posted by Sean Owen <sr...@gmail.com>.
My $0.02 -- this isn't worthwhile.

Yes, there are ML-in-SQL tools. I'm thinking of MADlib for example. I think
these hold over from days when someone's only interface to a data warehouse
was SQL, and so there had to be SQL-language support for invoking ML jobs.
There was no programmatic alternative.

There's nothing particularly helpful about SQL as a language for expressing
this, versus simply writing operations in a high-level programming language.

Spark is that programmatic paradigm, and offers a more general way to
express ETL, ML and SQL within their own appropriate DSLs. There's no need
to also shoehorn Spark ML into Spark SQL.

I also think there's a bit of false abstraction here. The nice thing about
SQL-only access to these functions is it sounds much simpler, and
accessible to people that only know SQL and nothing about Python or JVMs.
In practice, using Spark means having some basic awareness of its
distributed execution environment. SQL-only analysts would struggle to be
effective with SQL-only access to Spark.

On Fri, Aug 31, 2018 at 5:05 AM Hemant Bhanawat <he...@gmail.com>
wrote:

> We allow our users to interact with spark cluster using SQL queries only.
> That's easy for them. MLLib does not have SQL extensions and we cannot
> expose it to our users.
>
> SQL extensions can further accelerate MLLib's adoption. See
> https://cloud.google.com/bigquery/docs/bigqueryml-intro.
>
> Hemant
>
>
> On Thu, Aug 30, 2018 at 9:41 PM William Benton <wi...@redhat.com> wrote:
>
>> What are you interested in accomplishing?
>>
>> The spark.ml package has provided a machine learning API based on
>> DataFrames for quite some time.  If you are interested in mixing query
>> processing and machine learning, this is certainly the best place to start.
>>
>> See here:  https://spark.apache.org/docs/latest/ml-guide.html
>>
>>
>> best,
>> wb
>>
>>
>>
>> On Thu, Aug 30, 2018 at 1:45 AM Hemant Bhanawat <he...@gmail.com>
>> wrote:
>>
>>> Is there a plan to support SQL extensions for mllib? Or is there an
>>> effort already underway?
>>>
>>> Any information is appreciated.
>>>
>>> Thanks in advance.
>>> Hemant
>>>
>>

Re: mllib + SQL

Posted by Hemant Bhanawat <he...@gmail.com>.
We allow our users to interact with spark cluster using SQL queries only.
That's easy for them. MLLib does not have SQL extensions and we cannot
expose it to our users.

SQL extensions can further accelerate MLLib's adoption. See
https://cloud.google.com/bigquery/docs/bigqueryml-intro.

Hemant


On Thu, Aug 30, 2018 at 9:41 PM William Benton <wi...@redhat.com> wrote:

> What are you interested in accomplishing?
>
> The spark.ml package has provided a machine learning API based on
> DataFrames for quite some time.  If you are interested in mixing query
> processing and machine learning, this is certainly the best place to start.
>
> See here:  https://spark.apache.org/docs/latest/ml-guide.html
>
>
> best,
> wb
>
>
>
> On Thu, Aug 30, 2018 at 1:45 AM Hemant Bhanawat <he...@gmail.com>
> wrote:
>
>> Is there a plan to support SQL extensions for mllib? Or is there an
>> effort already underway?
>>
>> Any information is appreciated.
>>
>> Thanks in advance.
>> Hemant
>>
>

Re: mllib + SQL

Posted by William Benton <wi...@redhat.com>.
What are you interested in accomplishing?

The spark.ml package has provided a machine learning API based on
DataFrames for quite some time.  If you are interested in mixing query
processing and machine learning, this is certainly the best place to start.

See here:  https://spark.apache.org/docs/latest/ml-guide.html


best,
wb



On Thu, Aug 30, 2018 at 1:45 AM Hemant Bhanawat <he...@gmail.com>
wrote:

> Is there a plan to support SQL extensions for mllib? Or is there an effort
> already underway?
>
> Any information is appreciated.
>
> Thanks in advance.
> Hemant
>