You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by bo yang <bo...@gmail.com> on 2017/06/12 05:29:50 UTC

Use SQL Script to Write Spark SQL Jobs

Hi Guys,

I am writing a small open source project
<https://github.com/uber/uberscriptquery> to use SQL Script to write Spark
Jobs. Want to see if there are other people interested to use or contribute
to this project.

The project is called UberScriptQuery (
https://github.com/uber/uberscriptquery). Sorry for the dumb name to avoid
conflict with many other names (Spark is registered trademark, thus I could
not use Spark in my project name).

In short, it is a high level SQL-like DSL (Domain Specific Language) on top
of Spark. People can use that DSL to write Spark jobs without worrying
about Spark internal details. Please check README
<https://github.com/uber/uberscriptquery> in the project to get more
details.

It will be great if I could get any feedback or suggestions!

Best,
Bo

Re: Use SQL Script to Write Spark SQL Jobs

Posted by bo yang <bo...@gmail.com>.
Hi Nihed,

Interesting to see envelope. The idea is same there! Thanks for the sharing
:)

Best,
Bo


On Wed, Jun 14, 2017 at 12:22 AM, nihed mbarek <ni...@gmail.com> wrote:

> Hi
>
> I already saw a project with the same idea.
> https://github.com/cloudera-labs/envelope
>
> Regards,
>
> On Wed, 14 Jun 2017 at 04:32, bo yang <bo...@gmail.com> wrote:
>
>> Thanks Benjamin and Ayan for the feedback! You kind of represent two
>> group of people who need such script tool or not. Personally I find the
>> script is very useful for myself to write ETL pipelines and daily jobs.
>> Let's see whether there are other people interested in such project.
>>
>> Best,
>> Bo
>>
>>
>>
>>
>>
>> On Mon, Jun 12, 2017 at 11:26 PM, ayan guha <gu...@gmail.com> wrote:
>>
>>> Hi
>>>
>>> IMHO, this approach is not very useful.
>>>
>>> Firstly, 2 use cases mentioned in the project page:
>>>
>>> 1. Simplify spark development - I think the only thing can be done there
>>> is to come up with some boilerplate function, which essentially will take a
>>> sql and come back with a temp table name and a corresponding DF (Remember
>>> the project targets structured data sources only, not streaming or RDD).
>>> Building another mini-DSL on top of already fairly elaborate spark API
>>> never appealed to me.
>>>
>>> 2. Business Analysts using Spark - single word answer is Notebooks. Take
>>> your pick - Jupyter, Zeppelin, Hue.
>>>
>>> The case of "Spark is for Developers", IMHO, stemmed to the
>>> packaging/building overhead of spark apps. For Python users, this barrier
>>> is considerably lower (And maybe that is why I do not see a prominent
>>> need).
>>>
>>> But I can imagine the pain of a SQL developer coming into a scala/java
>>> world. I came from a hardcore SQL/DWH environment where I used to write SQL
>>> and SQL only. So SBT or MVN are still not my friend. Maybe someday they
>>> will. But learned them hard way, just because the value of using spark can
>>> offset the pain long long way. So, I think there is a need of spending time
>>> with the environment to get comfortable with it. And maybe, just maybe,
>>> using Nifi in case you miss drag/drop features too much :)
>>>
>>> But, these are my 2c, and sincerely humble opinion, and I wish you all
>>> the luck for your project.
>>>
>>> On Tue, Jun 13, 2017 at 3:23 PM, Benjamin Kim <bb...@gmail.com>
>>> wrote:
>>>
>>>> Hi Bo,
>>>>
>>>> +1 for your project. I come from the world of data warehouses, ETL, and
>>>> reporting analytics. There are many individuals who do not know or want to
>>>> do any coding. They are content with ANSI SQL and stick to it. ETL
>>>> workflows are also done without any coding using a drag-and-drop user
>>>> interface, such as Talend, SSIS, etc. There is a small amount of scripting
>>>> involved but not too much. I looked at what you are trying to do, and I
>>>> welcome it. This could open up Spark to the masses and shorten development
>>>> times.
>>>>
>>>> Cheers,
>>>> Ben
>>>>
>>>>
>>>> On Jun 12, 2017, at 10:14 PM, bo yang <bo...@gmail.com> wrote:
>>>>
>>>> Hi Aakash,
>>>>
>>>> Thanks for your willing to help :) It will be great if I could get more
>>>> feedback on my project. For example, is there any other people feeling the
>>>> need of using a script to write Spark job easily? Also, I would explore
>>>> whether it is possible that the Spark project takes some work to build such
>>>> a script based high level DSL.
>>>>
>>>> Best,
>>>> Bo
>>>>
>>>>
>>>> On Mon, Jun 12, 2017 at 12:14 PM, Aakash Basu <
>>>> aakash.spark.raj@gmail.com> wrote:
>>>>
>>>>> Hey,
>>>>>
>>>>> I work on Spark SQL and would pretty much be able to help you in this.
>>>>> Let me know your requirement.
>>>>>
>>>>> Thanks,
>>>>> Aakash.
>>>>>
>>>>> On 12-Jun-2017 11:00 AM, "bo yang" <bo...@gmail.com> wrote:
>>>>>
>>>>>> Hi Guys,
>>>>>>
>>>>>> I am writing a small open source project
>>>>>> <https://github.com/uber/uberscriptquery> to use SQL Script to write
>>>>>> Spark Jobs. Want to see if there are other people interested to use or
>>>>>> contribute to this project.
>>>>>>
>>>>>> The project is called UberScriptQuery (https://github.com/uber/
>>>>>> uberscriptquery). Sorry for the dumb name to avoid conflict with
>>>>>> many other names (Spark is registered trademark, thus I could not use Spark
>>>>>> in my project name).
>>>>>>
>>>>>> In short, it is a high level SQL-like DSL (Domain Specific Language)
>>>>>> on top of Spark. People can use that DSL to write Spark jobs without
>>>>>> worrying about Spark internal details. Please check README
>>>>>> <https://github.com/uber/uberscriptquery> in the project to get more
>>>>>> details.
>>>>>>
>>>>>> It will be great if I could get any feedback or suggestions!
>>>>>>
>>>>>> Best,
>>>>>> Bo
>>>>>>
>>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>
>> --
>
> M'BAREK Med Nihed,
> Fedora Ambassador, TUNISIA, Northern Africa
> http://www.nihed.com
>
> <http://tn.linkedin.com/in/nihed>
>
>

Re: Use SQL Script to Write Spark SQL Jobs

Posted by nihed mbarek <ni...@gmail.com>.
Hi

I already saw a project with the same idea.
https://github.com/cloudera-labs/envelope

Regards,

On Wed, 14 Jun 2017 at 04:32, bo yang <bo...@gmail.com> wrote:

> Thanks Benjamin and Ayan for the feedback! You kind of represent two group
> of people who need such script tool or not. Personally I find the script is
> very useful for myself to write ETL pipelines and daily jobs. Let's see
> whether there are other people interested in such project.
>
> Best,
> Bo
>
>
>
>
>
> On Mon, Jun 12, 2017 at 11:26 PM, ayan guha <gu...@gmail.com> wrote:
>
>> Hi
>>
>> IMHO, this approach is not very useful.
>>
>> Firstly, 2 use cases mentioned in the project page:
>>
>> 1. Simplify spark development - I think the only thing can be done there
>> is to come up with some boilerplate function, which essentially will take a
>> sql and come back with a temp table name and a corresponding DF (Remember
>> the project targets structured data sources only, not streaming or RDD).
>> Building another mini-DSL on top of already fairly elaborate spark API
>> never appealed to me.
>>
>> 2. Business Analysts using Spark - single word answer is Notebooks. Take
>> your pick - Jupyter, Zeppelin, Hue.
>>
>> The case of "Spark is for Developers", IMHO, stemmed to the
>> packaging/building overhead of spark apps. For Python users, this barrier
>> is considerably lower (And maybe that is why I do not see a prominent
>> need).
>>
>> But I can imagine the pain of a SQL developer coming into a scala/java
>> world. I came from a hardcore SQL/DWH environment where I used to write SQL
>> and SQL only. So SBT or MVN are still not my friend. Maybe someday they
>> will. But learned them hard way, just because the value of using spark can
>> offset the pain long long way. So, I think there is a need of spending time
>> with the environment to get comfortable with it. And maybe, just maybe,
>> using Nifi in case you miss drag/drop features too much :)
>>
>> But, these are my 2c, and sincerely humble opinion, and I wish you all
>> the luck for your project.
>>
>> On Tue, Jun 13, 2017 at 3:23 PM, Benjamin Kim <bb...@gmail.com> wrote:
>>
>>> Hi Bo,
>>>
>>> +1 for your project. I come from the world of data warehouses, ETL, and
>>> reporting analytics. There are many individuals who do not know or want to
>>> do any coding. They are content with ANSI SQL and stick to it. ETL
>>> workflows are also done without any coding using a drag-and-drop user
>>> interface, such as Talend, SSIS, etc. There is a small amount of scripting
>>> involved but not too much. I looked at what you are trying to do, and I
>>> welcome it. This could open up Spark to the masses and shorten development
>>> times.
>>>
>>> Cheers,
>>> Ben
>>>
>>>
>>> On Jun 12, 2017, at 10:14 PM, bo yang <bo...@gmail.com> wrote:
>>>
>>> Hi Aakash,
>>>
>>> Thanks for your willing to help :) It will be great if I could get more
>>> feedback on my project. For example, is there any other people feeling the
>>> need of using a script to write Spark job easily? Also, I would explore
>>> whether it is possible that the Spark project takes some work to build such
>>> a script based high level DSL.
>>>
>>> Best,
>>> Bo
>>>
>>>
>>> On Mon, Jun 12, 2017 at 12:14 PM, Aakash Basu <
>>> aakash.spark.raj@gmail.com> wrote:
>>>
>>>> Hey,
>>>>
>>>> I work on Spark SQL and would pretty much be able to help you in this.
>>>> Let me know your requirement.
>>>>
>>>> Thanks,
>>>> Aakash.
>>>>
>>>> On 12-Jun-2017 11:00 AM, "bo yang" <bo...@gmail.com> wrote:
>>>>
>>>>> Hi Guys,
>>>>>
>>>>> I am writing a small open source project
>>>>> <https://github.com/uber/uberscriptquery> to use SQL Script to write
>>>>> Spark Jobs. Want to see if there are other people interested to use or
>>>>> contribute to this project.
>>>>>
>>>>> The project is called UberScriptQuery (
>>>>> https://github.com/uber/uberscriptquery). Sorry for the dumb name to
>>>>> avoid conflict with many other names (Spark is registered trademark, thus I
>>>>> could not use Spark in my project name).
>>>>>
>>>>> In short, it is a high level SQL-like DSL (Domain Specific Language)
>>>>> on top of Spark. People can use that DSL to write Spark jobs without
>>>>> worrying about Spark internal details. Please check README
>>>>> <https://github.com/uber/uberscriptquery> in the project to get more
>>>>> details.
>>>>>
>>>>> It will be great if I could get any feedback or suggestions!
>>>>>
>>>>> Best,
>>>>> Bo
>>>>>
>>>>>
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
> --

M'BAREK Med Nihed,
Fedora Ambassador, TUNISIA, Northern Africa
http://www.nihed.com

<http://tn.linkedin.com/in/nihed>

Re: Use SQL Script to Write Spark SQL Jobs

Posted by bo yang <bo...@gmail.com>.
Thanks Benjamin and Ayan for the feedback! You kind of represent two group
of people who need such script tool or not. Personally I find the script is
very useful for myself to write ETL pipelines and daily jobs. Let's see
whether there are other people interested in such project.

Best,
Bo





On Mon, Jun 12, 2017 at 11:26 PM, ayan guha <gu...@gmail.com> wrote:

> Hi
>
> IMHO, this approach is not very useful.
>
> Firstly, 2 use cases mentioned in the project page:
>
> 1. Simplify spark development - I think the only thing can be done there
> is to come up with some boilerplate function, which essentially will take a
> sql and come back with a temp table name and a corresponding DF (Remember
> the project targets structured data sources only, not streaming or RDD).
> Building another mini-DSL on top of already fairly elaborate spark API
> never appealed to me.
>
> 2. Business Analysts using Spark - single word answer is Notebooks. Take
> your pick - Jupyter, Zeppelin, Hue.
>
> The case of "Spark is for Developers", IMHO, stemmed to the
> packaging/building overhead of spark apps. For Python users, this barrier
> is considerably lower (And maybe that is why I do not see a prominent
> need).
>
> But I can imagine the pain of a SQL developer coming into a scala/java
> world. I came from a hardcore SQL/DWH environment where I used to write SQL
> and SQL only. So SBT or MVN are still not my friend. Maybe someday they
> will. But learned them hard way, just because the value of using spark can
> offset the pain long long way. So, I think there is a need of spending time
> with the environment to get comfortable with it. And maybe, just maybe,
> using Nifi in case you miss drag/drop features too much :)
>
> But, these are my 2c, and sincerely humble opinion, and I wish you all the
> luck for your project.
>
> On Tue, Jun 13, 2017 at 3:23 PM, Benjamin Kim <bb...@gmail.com> wrote:
>
>> Hi Bo,
>>
>> +1 for your project. I come from the world of data warehouses, ETL, and
>> reporting analytics. There are many individuals who do not know or want to
>> do any coding. They are content with ANSI SQL and stick to it. ETL
>> workflows are also done without any coding using a drag-and-drop user
>> interface, such as Talend, SSIS, etc. There is a small amount of scripting
>> involved but not too much. I looked at what you are trying to do, and I
>> welcome it. This could open up Spark to the masses and shorten development
>> times.
>>
>> Cheers,
>> Ben
>>
>>
>> On Jun 12, 2017, at 10:14 PM, bo yang <bo...@gmail.com> wrote:
>>
>> Hi Aakash,
>>
>> Thanks for your willing to help :) It will be great if I could get more
>> feedback on my project. For example, is there any other people feeling the
>> need of using a script to write Spark job easily? Also, I would explore
>> whether it is possible that the Spark project takes some work to build such
>> a script based high level DSL.
>>
>> Best,
>> Bo
>>
>>
>> On Mon, Jun 12, 2017 at 12:14 PM, Aakash Basu <aakash.spark.raj@gmail.com
>> > wrote:
>>
>>> Hey,
>>>
>>> I work on Spark SQL and would pretty much be able to help you in this.
>>> Let me know your requirement.
>>>
>>> Thanks,
>>> Aakash.
>>>
>>> On 12-Jun-2017 11:00 AM, "bo yang" <bo...@gmail.com> wrote:
>>>
>>>> Hi Guys,
>>>>
>>>> I am writing a small open source project
>>>> <https://github.com/uber/uberscriptquery> to use SQL Script to write
>>>> Spark Jobs. Want to see if there are other people interested to use or
>>>> contribute to this project.
>>>>
>>>> The project is called UberScriptQuery (https://githu
>>>> b.com/uber/uberscriptquery). Sorry for the dumb name to avoid conflict
>>>> with many other names (Spark is registered trademark, thus I could not use
>>>> Spark in my project name).
>>>>
>>>> In short, it is a high level SQL-like DSL (Domain Specific Language) on
>>>> top of Spark. People can use that DSL to write Spark jobs without worrying
>>>> about Spark internal details. Please check README
>>>> <https://github.com/uber/uberscriptquery> in the project to get more
>>>> details.
>>>>
>>>> It will be great if I could get any feedback or suggestions!
>>>>
>>>> Best,
>>>> Bo
>>>>
>>>>
>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>

Re: Use SQL Script to Write Spark SQL Jobs

Posted by ayan guha <gu...@gmail.com>.
Hi

IMHO, this approach is not very useful.

Firstly, 2 use cases mentioned in the project page:

1. Simplify spark development - I think the only thing can be done there is
to come up with some boilerplate function, which essentially will take a
sql and come back with a temp table name and a corresponding DF (Remember
the project targets structured data sources only, not streaming or RDD).
Building another mini-DSL on top of already fairly elaborate spark API
never appealed to me.

2. Business Analysts using Spark - single word answer is Notebooks. Take
your pick - Jupyter, Zeppelin, Hue.

The case of "Spark is for Developers", IMHO, stemmed to the
packaging/building overhead of spark apps. For Python users, this barrier
is considerably lower (And maybe that is why I do not see a prominent
need).

But I can imagine the pain of a SQL developer coming into a scala/java
world. I came from a hardcore SQL/DWH environment where I used to write SQL
and SQL only. So SBT or MVN are still not my friend. Maybe someday they
will. But learned them hard way, just because the value of using spark can
offset the pain long long way. So, I think there is a need of spending time
with the environment to get comfortable with it. And maybe, just maybe,
using Nifi in case you miss drag/drop features too much :)

But, these are my 2c, and sincerely humble opinion, and I wish you all the
luck for your project.

On Tue, Jun 13, 2017 at 3:23 PM, Benjamin Kim <bb...@gmail.com> wrote:

> Hi Bo,
>
> +1 for your project. I come from the world of data warehouses, ETL, and
> reporting analytics. There are many individuals who do not know or want to
> do any coding. They are content with ANSI SQL and stick to it. ETL
> workflows are also done without any coding using a drag-and-drop user
> interface, such as Talend, SSIS, etc. There is a small amount of scripting
> involved but not too much. I looked at what you are trying to do, and I
> welcome it. This could open up Spark to the masses and shorten development
> times.
>
> Cheers,
> Ben
>
>
> On Jun 12, 2017, at 10:14 PM, bo yang <bo...@gmail.com> wrote:
>
> Hi Aakash,
>
> Thanks for your willing to help :) It will be great if I could get more
> feedback on my project. For example, is there any other people feeling the
> need of using a script to write Spark job easily? Also, I would explore
> whether it is possible that the Spark project takes some work to build such
> a script based high level DSL.
>
> Best,
> Bo
>
>
> On Mon, Jun 12, 2017 at 12:14 PM, Aakash Basu <aa...@gmail.com>
> wrote:
>
>> Hey,
>>
>> I work on Spark SQL and would pretty much be able to help you in this.
>> Let me know your requirement.
>>
>> Thanks,
>> Aakash.
>>
>> On 12-Jun-2017 11:00 AM, "bo yang" <bo...@gmail.com> wrote:
>>
>>> Hi Guys,
>>>
>>> I am writing a small open source project
>>> <https://github.com/uber/uberscriptquery> to use SQL Script to write
>>> Spark Jobs. Want to see if there are other people interested to use or
>>> contribute to this project.
>>>
>>> The project is called UberScriptQuery (https://githu
>>> b.com/uber/uberscriptquery). Sorry for the dumb name to avoid conflict
>>> with many other names (Spark is registered trademark, thus I could not use
>>> Spark in my project name).
>>>
>>> In short, it is a high level SQL-like DSL (Domain Specific Language) on
>>> top of Spark. People can use that DSL to write Spark jobs without worrying
>>> about Spark internal details. Please check README
>>> <https://github.com/uber/uberscriptquery> in the project to get more
>>> details.
>>>
>>> It will be great if I could get any feedback or suggestions!
>>>
>>> Best,
>>> Bo
>>>
>>>
>
>


-- 
Best Regards,
Ayan Guha

Re: Use SQL Script to Write Spark SQL Jobs

Posted by Benjamin Kim <bb...@gmail.com>.
Hi Bo,

+1 for your project. I come from the world of data warehouses, ETL, and reporting analytics. There are many individuals who do not know or want to do any coding. They are content with ANSI SQL and stick to it. ETL workflows are also done without any coding using a drag-and-drop user interface, such as Talend, SSIS, etc. There is a small amount of scripting involved but not too much. I looked at what you are trying to do, and I welcome it. This could open up Spark to the masses and shorten development times.

Cheers,
Ben


> On Jun 12, 2017, at 10:14 PM, bo yang <bo...@gmail.com> wrote:
> 
> Hi Aakash,
> 
> Thanks for your willing to help :) It will be great if I could get more feedback on my project. For example, is there any other people feeling the need of using a script to write Spark job easily? Also, I would explore whether it is possible that the Spark project takes some work to build such a script based high level DSL.
> 
> Best,
> Bo
> 
> 
> On Mon, Jun 12, 2017 at 12:14 PM, Aakash Basu <aakash.spark.raj@gmail.com <ma...@gmail.com>> wrote:
> Hey,
> 
> I work on Spark SQL and would pretty much be able to help you in this. Let me know your requirement.
> 
> Thanks,
> Aakash.
> 
> On 12-Jun-2017 11:00 AM, "bo yang" <bobyangbo@gmail.com <ma...@gmail.com>> wrote:
> Hi Guys,
> 
> I am writing a small open source project <https://github.com/uber/uberscriptquery> to use SQL Script to write Spark Jobs. Want to see if there are other people interested to use or contribute to this project.
> 
> The project is called UberScriptQuery (https://github.com/uber/uberscriptquery <https://github.com/uber/uberscriptquery>). Sorry for the dumb name to avoid conflict with many other names (Spark is registered trademark, thus I could not use Spark in my project name).
> 
> In short, it is a high level SQL-like DSL (Domain Specific Language) on top of Spark. People can use that DSL to write Spark jobs without worrying about Spark internal details. Please check README <https://github.com/uber/uberscriptquery> in the project to get more details.
> 
> It will be great if I could get any feedback or suggestions!
> 
> Best,
> Bo
> 
> 


Re: Use SQL Script to Write Spark SQL Jobs

Posted by bo yang <bo...@gmail.com>.
Hi Aakash,

Thanks for your willing to help :) It will be great if I could get more
feedback on my project. For example, is there any other people feeling the
need of using a script to write Spark job easily? Also, I would explore
whether it is possible that the Spark project takes some work to build such
a script based high level DSL.

Best,
Bo


On Mon, Jun 12, 2017 at 12:14 PM, Aakash Basu <aa...@gmail.com>
wrote:

> Hey,
>
> I work on Spark SQL and would pretty much be able to help you in this. Let
> me know your requirement.
>
> Thanks,
> Aakash.
>
> On 12-Jun-2017 11:00 AM, "bo yang" <bo...@gmail.com> wrote:
>
>> Hi Guys,
>>
>> I am writing a small open source project
>> <https://github.com/uber/uberscriptquery> to use SQL Script to write
>> Spark Jobs. Want to see if there are other people interested to use or
>> contribute to this project.
>>
>> The project is called UberScriptQuery (https://githu
>> b.com/uber/uberscriptquery). Sorry for the dumb name to avoid conflict
>> with many other names (Spark is registered trademark, thus I could not use
>> Spark in my project name).
>>
>> In short, it is a high level SQL-like DSL (Domain Specific Language) on
>> top of Spark. People can use that DSL to write Spark jobs without worrying
>> about Spark internal details. Please check README
>> <https://github.com/uber/uberscriptquery> in the project to get more
>> details.
>>
>> It will be great if I could get any feedback or suggestions!
>>
>> Best,
>> Bo
>>
>>

Re: Use SQL Script to Write Spark SQL Jobs

Posted by Aakash Basu <aa...@gmail.com>.
Hey,

I work on Spark SQL and would pretty much be able to help you in this. Let
me know your requirement.

Thanks,
Aakash.

On 12-Jun-2017 11:00 AM, "bo yang" <bo...@gmail.com> wrote:

> Hi Guys,
>
> I am writing a small open source project
> <https://github.com/uber/uberscriptquery> to use SQL Script to write
> Spark Jobs. Want to see if there are other people interested to use or
> contribute to this project.
>
> The project is called UberScriptQuery (https://github.com/uber/
> uberscriptquery). Sorry for the dumb name to avoid conflict with many
> other names (Spark is registered trademark, thus I could not use Spark in
> my project name).
>
> In short, it is a high level SQL-like DSL (Domain Specific Language) on
> top of Spark. People can use that DSL to write Spark jobs without worrying
> about Spark internal details. Please check README
> <https://github.com/uber/uberscriptquery> in the project to get more
> details.
>
> It will be great if I could get any feedback or suggestions!
>
> Best,
> Bo
>
>