You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by kant kodali <ka...@gmail.com> on 2017/07/25 09:46:14 UTC

some Ideas on expressing Spark SQL using JSON

Hi All,

I am thinking to express Spark SQL using JSON in the following the way.

For Example:

*Query using Spark DSL*

DS.filter(col("name").equalTo("john"))
        .groupBy(functions.window(df1.col("TIMESTAMP"), "24 hours",
"24 hours"), df1.col("hourlyPay"))
        .agg(sum("hourlyPay").as("total"));


*Query using JSON*





The Goal is to design a DSL in JSON such that users can and express SPARK
SQL queries in JSON so users can send Spark SQL queries over rest and get
the results out. Now, I am sure there are BI tools and notebooks like
Zeppelin that can accomplish the desired behavior however I believe there
maybe group of users who don't want to use those BI tools or notebooks
instead they want all the communication from front end to back end using
API's.

Also another goal would be the DSL design in JSON should closely mimic the
underlying Spark SQL DSL.

Please feel free to provide some feedback or criticize to whatever extent
you like!

Thanks!

Re: some Ideas on expressing Spark SQL using JSON

Posted by Gourav Sengupta <go...@gmail.com>.

100% agreed with Sathish, In case I am not offending anyone, this kind of
questions basically comes from individuals who are still in the mindset of
JAVA way of solving problems which used to be around 10 years back.
Therefore you will see a lot of user issues who are still used to writing
around 1000 lines of code for 10 lines of Scala or Python and reporting
issues which are already solved if you start using SPARK the way it is
fundamentally designed.
Genuinely sorry in case this has offended anyone, but just thought it might
be very useful to point this out in case anyone wants to correct their
fundamental approach.

On Wed, Jul 26, 2017 at 1:22 PM, Sathish Kumaran Vairavelu <
vsathishkumaran@gmail.com> wrote:

> Agreed. For the same reason dataframes / dataset which is another DSL used
> in Spark
>
> On Wed, Jul 26, 2017 at 1:00 AM Georg Heiler <ge...@gmail.com>
> wrote:
>
>> Because sparks dsl partially supports compile time type safety. E.g. the
>> compiler will notify you that a sql function was misspelled when using the
>> dsl opposed to the plain sql string which is only parsed at runtime.
>> Sathish Kumaran Vairavelu <vs...@gmail.com> schrieb am Di. 25.
>> Juli 2017 um 23:42:
>>
>>> Just a thought. SQL itself is a DSL. Why DSL on top of another DSL?
>>> On Tue, Jul 25, 2017 at 4:47 AM kant kodali <ka...@gmail.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I am thinking to express Spark SQL using JSON in the following the way.
>>>>
>>>> For Example:
>>>>
>>>> *Query using Spark DSL*
>>>>
>>>> DS.filter(col("name").equalTo("john"))
>>>>         .groupBy(functions.window(df1.col("TIMESTAMP"), "24 hours", "24 hours"), df1.col("hourlyPay"))
>>>>         .agg(sum("hourlyPay").as("total"));
>>>>
>>>>
>>>> *Query using JSON*
>>>>
>>>>
>>>>
>>>> 
>>>> 
>>>> The Goal is to design a DSL in JSON such that users can and express
>>>> SPARK SQL queries in JSON so users can send Spark SQL queries over rest and
>>>> get the results out. Now, I am sure there are BI tools and notebooks like
>>>> Zeppelin that can accomplish the desired behavior however I believe there
>>>> maybe group of users who don't want to use those BI tools or notebooks
>>>> instead they want all the communication from front end to back end using
>>>> API's.
>>>>
>>>> Also another goal would be the DSL design in JSON should closely mimic
>>>> the underlying Spark SQL DSL.
>>>>
>>>> Please feel free to provide some feedback or criticize to whatever
>>>> extent you like!
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>

Re: some Ideas on expressing Spark SQL using JSON

Posted by Sathish Kumaran Vairavelu <vs...@gmail.com>.

Agreed. For the same reason dataframes / dataset which is another DSL used
in Spark
On Wed, Jul 26, 2017 at 1:00 AM Georg Heiler <ge...@gmail.com>
wrote:

> Because sparks dsl partially supports compile time type safety. E.g. the
> compiler will notify you that a sql function was misspelled when using the
> dsl opposed to the plain sql string which is only parsed at runtime.
> Sathish Kumaran Vairavelu <vs...@gmail.com> schrieb am Di. 25.
> Juli 2017 um 23:42:
>
>> Just a thought. SQL itself is a DSL. Why DSL on top of another DSL?
>> On Tue, Jul 25, 2017 at 4:47 AM kant kodali <ka...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> I am thinking to express Spark SQL using JSON in the following the way.
>>>
>>> For Example:
>>>
>>> *Query using Spark DSL*
>>>
>>> DS.filter(col("name").equalTo("john"))
>>>         .groupBy(functions.window(df1.col("TIMESTAMP"), "24 hours", "24 hours"), df1.col("hourlyPay"))
>>>         .agg(sum("hourlyPay").as("total"));
>>>
>>>
>>> *Query using JSON*
>>>
>>>
>>>
>>> 
>>> 
>>> The Goal is to design a DSL in JSON such that users can and express
>>> SPARK SQL queries in JSON so users can send Spark SQL queries over rest and
>>> get the results out. Now, I am sure there are BI tools and notebooks like
>>> Zeppelin that can accomplish the desired behavior however I believe there
>>> maybe group of users who don't want to use those BI tools or notebooks
>>> instead they want all the communication from front end to back end using
>>> API's.
>>>
>>> Also another goal would be the DSL design in JSON should closely mimic
>>> the underlying Spark SQL DSL.
>>>
>>> Please feel free to provide some feedback or criticize to whatever
>>> extent you like!
>>>
>>> Thanks!
>>>
>>>
>>>

Re: some Ideas on expressing Spark SQL using JSON

Posted by Georg Heiler <ge...@gmail.com>.

Because sparks dsl partially supports compile time type safety. E.g. the
compiler will notify you that a sql function was misspelled when using the
dsl opposed to the plain sql string which is only parsed at runtime.
Sathish Kumaran Vairavelu <vs...@gmail.com> schrieb am Di. 25.
Juli 2017 um 23:42:

> Just a thought. SQL itself is a DSL. Why DSL on top of another DSL?
> On Tue, Jul 25, 2017 at 4:47 AM kant kodali <ka...@gmail.com> wrote:
>
>> Hi All,
>>
>> I am thinking to express Spark SQL using JSON in the following the way.
>>
>> For Example:
>>
>> *Query using Spark DSL*
>>
>> DS.filter(col("name").equalTo("john"))
>>         .groupBy(functions.window(df1.col("TIMESTAMP"), "24 hours", "24 hours"), df1.col("hourlyPay"))
>>         .agg(sum("hourlyPay").as("total"));
>>
>>
>> *Query using JSON*
>>
>>
>>
>> 
>> 
>> The Goal is to design a DSL in JSON such that users can and express SPARK
>> SQL queries in JSON so users can send Spark SQL queries over rest and get
>> the results out. Now, I am sure there are BI tools and notebooks like
>> Zeppelin that can accomplish the desired behavior however I believe there
>> maybe group of users who don't want to use those BI tools or notebooks
>> instead they want all the communication from front end to back end using
>> API's.
>>
>> Also another goal would be the DSL design in JSON should closely mimic
>> the underlying Spark SQL DSL.
>>
>> Please feel free to provide some feedback or criticize to whatever extent
>> you like!
>>
>> Thanks!
>>
>>
>>

Re: some Ideas on expressing Spark SQL using JSON

Posted by Sathish Kumaran Vairavelu <vs...@gmail.com>.

Just a thought. SQL itself is a DSL. Why DSL on top of another DSL?
On Tue, Jul 25, 2017 at 4:47 AM kant kodali <ka...@gmail.com> wrote:

> Hi All,
>
> I am thinking to express Spark SQL using JSON in the following the way.
>
> For Example:
>
> *Query using Spark DSL*
>
> DS.filter(col("name").equalTo("john"))
>         .groupBy(functions.window(df1.col("TIMESTAMP"), "24 hours", "24 hours"), df1.col("hourlyPay"))
>         .agg(sum("hourlyPay").as("total"));
>
>
> *Query using JSON*
>
>
>
> 
> 
> The Goal is to design a DSL in JSON such that users can and express SPARK
> SQL queries in JSON so users can send Spark SQL queries over rest and get
> the results out. Now, I am sure there are BI tools and notebooks like
> Zeppelin that can accomplish the desired behavior however I believe there
> maybe group of users who don't want to use those BI tools or notebooks
> instead they want all the communication from front end to back end using
> API's.
>
> Also another goal would be the DSL design in JSON should closely mimic the
> underlying Spark SQL DSL.
>
> Please feel free to provide some feedback or criticize to whatever extent
> you like!
>
> Thanks!
>
>
>