You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Wenchen Fan <cl...@gmail.com> on 2018/07/12 03:24:38 UTC

Re: [DISCUSS] SPIP: Standardize SQL logical plans

Hi Ryan,

Great job on this! Shall we call a vote for the plan standardization SPIP?
I think this is a good idea and we should do it.

Notes:
We definitely need new user-facing APIs to produce these new logical plans
like DeleteData. But we need a design doc for these new APIs after the SPIP
passed.
We definitely need the data source to provide the ability to
create/drop/alter/lookup tables, but that belongs to the other SPIP and
should be voted separately.

Thanks,
Wenchen

On Fri, Apr 20, 2018 at 5:01 AM Ryan Blue <rb...@netflix.com.invalid> wrote:

> Hi everyone,
>
> A few weeks ago, I wrote up a proposal to standardize SQL logical plans
> <https://docs.google.com/document/d/1gYm5Ji2Mge3QBdOliFV5gSPTKlX4q1DCBXIkiyMv62A/edit?ts=5ace0718#heading=h.m45webtwxf2d> and
> a supporting design doc for data source catalog APIs
> <https://docs.google.com/document/d/1zLFiA1VuaWeVxeTDXNg8bL6GP3BVoOZBkewFtEnjEoo/edit#heading=h.m45webtwxf2d>.
> From the comments on those docs, it looks like we mostly have agreement
> around standardizing plans and around the data source catalog API.
>
> We still need to work out details, like the transactional API extension,
> but I'd like to get started implementing those proposals so we have
> something working for the 2.4.0 release. I'm starting this thread because I
> think we're about ready to vote on the proposal
> <https://spark.apache.org/improvement-proposals.html#discussing-an-spip>
> and I'd like to get any remaining discussion going or get anyone that
> missed this to read through the docs.
>
> Thanks!
>
> rb
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: [DISCUSS] SPIP: Standardize SQL logical plans

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
I just called a vote on this. I don't think we really need a shepherd if
there's enough interest for a vote to pass.

rb

On Tue, Jul 17, 2018 at 9:00 AM Cody Koeninger <co...@koeninger.org> wrote:

> According to
>
> http://spark.apache.org/improvement-proposals.html
>
> the shepherd should be a PMC member, not necessarily the person who
> proposed the SPIP
>
> On Tue, Jul 17, 2018 at 9:13 AM, Wenchen Fan <cl...@gmail.com> wrote:
> > I don't know an official answer, but conventionally people who propose
> the
> > SPIP would call the vote and "shepherd" the project. Other people can
> jump
> > in during the development. I'm interested in the new API and like to
> work on
> > it after the vote passes.
> >
> > Thanks,
> > Wenchen
> >
> > On Fri, Jul 13, 2018 at 7:25 AM Ryan Blue <rb...@netflix.com> wrote:
> >>
> >> Thanks! I'm all for calling a vote on the SPIP. If I understand the
> >> process correctly, the intent is for a "shepherd" to do it. I'm happy to
> >> call a vote, or feel free if you'd like to play that role.
> >>
> >> Other comments:
> >> * DeleteData API: I completely agree that we need to have a proposal for
> >> it. I think the SQL side is easier because DELETE FROM is already a
> >> statement. We just need to be able to identify v2 tables to use it. I'll
> >> come up with something and send a proposal to the dev list.
> >> * Table create/drop/alter/load API: I think we have agreement around the
> >> proposed DataSourceV2 API, but we need to decide how the public API will
> >> work and how this will fit in with ExternalCatalog (see the other
> thread for
> >> discussion there). Do you think we need to get that entire SPIP approved
> >> before we can start getting the API in? If so, what do you think needs
> to be
> >> decided to get it ready?
> >>
> >> Thanks!
> >>
> >> rb
> >>
> >> On Wed, Jul 11, 2018 at 8:24 PM Wenchen Fan <cl...@gmail.com>
> wrote:
> >>>
> >>> Hi Ryan,
> >>>
> >>> Great job on this! Shall we call a vote for the plan standardization
> >>> SPIP? I think this is a good idea and we should do it.
> >>>
> >>> Notes:
> >>> We definitely need new user-facing APIs to produce these new logical
> >>> plans like DeleteData. But we need a design doc for these new APIs
> after the
> >>> SPIP passed.
> >>> We definitely need the data source to provide the ability to
> >>> create/drop/alter/lookup tables, but that belongs to the other SPIP and
> >>> should be voted separately.
> >>>
> >>> Thanks,
> >>> Wenchen
> >>>
> >>> On Fri, Apr 20, 2018 at 5:01 AM Ryan Blue <rb...@netflix.com.invalid>
> >>> wrote:
> >>>>
> >>>> Hi everyone,
> >>>>
> >>>> A few weeks ago, I wrote up a proposal to standardize SQL logical
> plans
> >>>> and a supporting design doc for data source catalog APIs. From the
> comments
> >>>> on those docs, it looks like we mostly have agreement around
> standardizing
> >>>> plans and around the data source catalog API.
> >>>>
> >>>> We still need to work out details, like the transactional API
> extension,
> >>>> but I'd like to get started implementing those proposals so we have
> >>>> something working for the 2.4.0 release. I'm starting this thread
> because I
> >>>> think we're about ready to vote on the proposal and I'd like to get
> any
> >>>> remaining discussion going or get anyone that missed this to read
> through
> >>>> the docs.
> >>>>
> >>>> Thanks!
> >>>>
> >>>> rb
> >>>>
> >>>> --
> >>>> Ryan Blue
> >>>> Software Engineer
> >>>> Netflix
> >>
> >>
> >>
> >> --
> >> Ryan Blue
> >> Software Engineer
> >> Netflix
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: [DISCUSS] SPIP: Standardize SQL logical plans

Posted by Cody Koeninger <co...@koeninger.org>.
According to

http://spark.apache.org/improvement-proposals.html

the shepherd should be a PMC member, not necessarily the person who
proposed the SPIP

On Tue, Jul 17, 2018 at 9:13 AM, Wenchen Fan <cl...@gmail.com> wrote:
> I don't know an official answer, but conventionally people who propose the
> SPIP would call the vote and "shepherd" the project. Other people can jump
> in during the development. I'm interested in the new API and like to work on
> it after the vote passes.
>
> Thanks,
> Wenchen
>
> On Fri, Jul 13, 2018 at 7:25 AM Ryan Blue <rb...@netflix.com> wrote:
>>
>> Thanks! I'm all for calling a vote on the SPIP. If I understand the
>> process correctly, the intent is for a "shepherd" to do it. I'm happy to
>> call a vote, or feel free if you'd like to play that role.
>>
>> Other comments:
>> * DeleteData API: I completely agree that we need to have a proposal for
>> it. I think the SQL side is easier because DELETE FROM is already a
>> statement. We just need to be able to identify v2 tables to use it. I'll
>> come up with something and send a proposal to the dev list.
>> * Table create/drop/alter/load API: I think we have agreement around the
>> proposed DataSourceV2 API, but we need to decide how the public API will
>> work and how this will fit in with ExternalCatalog (see the other thread for
>> discussion there). Do you think we need to get that entire SPIP approved
>> before we can start getting the API in? If so, what do you think needs to be
>> decided to get it ready?
>>
>> Thanks!
>>
>> rb
>>
>> On Wed, Jul 11, 2018 at 8:24 PM Wenchen Fan <cl...@gmail.com> wrote:
>>>
>>> Hi Ryan,
>>>
>>> Great job on this! Shall we call a vote for the plan standardization
>>> SPIP? I think this is a good idea and we should do it.
>>>
>>> Notes:
>>> We definitely need new user-facing APIs to produce these new logical
>>> plans like DeleteData. But we need a design doc for these new APIs after the
>>> SPIP passed.
>>> We definitely need the data source to provide the ability to
>>> create/drop/alter/lookup tables, but that belongs to the other SPIP and
>>> should be voted separately.
>>>
>>> Thanks,
>>> Wenchen
>>>
>>> On Fri, Apr 20, 2018 at 5:01 AM Ryan Blue <rb...@netflix.com.invalid>
>>> wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> A few weeks ago, I wrote up a proposal to standardize SQL logical plans
>>>> and a supporting design doc for data source catalog APIs. From the comments
>>>> on those docs, it looks like we mostly have agreement around standardizing
>>>> plans and around the data source catalog API.
>>>>
>>>> We still need to work out details, like the transactional API extension,
>>>> but I'd like to get started implementing those proposals so we have
>>>> something working for the 2.4.0 release. I'm starting this thread because I
>>>> think we're about ready to vote on the proposal and I'd like to get any
>>>> remaining discussion going or get anyone that missed this to read through
>>>> the docs.
>>>>
>>>> Thanks!
>>>>
>>>> rb
>>>>
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Netflix
>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: [DISCUSS] SPIP: Standardize SQL logical plans

Posted by Wenchen Fan <cl...@gmail.com>.
I don't know an official answer, but conventionally people who propose the
SPIP would call the vote and "shepherd" the project. Other people can jump
in during the development. I'm interested in the new API and like to work
on it after the vote passes.

Thanks,
Wenchen

On Fri, Jul 13, 2018 at 7:25 AM Ryan Blue <rb...@netflix.com> wrote:

> Thanks! I'm all for calling a vote on the SPIP. If I understand the
> process correctly, the intent is for a "shepherd" to do it. I'm happy to
> call a vote, or feel free if you'd like to play that role.
>
> Other comments:
> * DeleteData API: I completely agree that we need to have a proposal for
> it. I think the SQL side is easier because DELETE FROM is already a
> statement. We just need to be able to identify v2 tables to use it. I'll
> come up with something and send a proposal to the dev list.
> * Table create/drop/alter/load API: I think we have agreement around the
> proposed DataSourceV2 API, but we need to decide how the public API will
> work and how this will fit in with ExternalCatalog (see the other thread
> for discussion there). Do you think we need to get that entire SPIP
> approved before we can start getting the API in? If so, what do you think
> needs to be decided to get it ready?
>
> Thanks!
>
> rb
>
> On Wed, Jul 11, 2018 at 8:24 PM Wenchen Fan <cl...@gmail.com> wrote:
>
>> Hi Ryan,
>>
>> Great job on this! Shall we call a vote for the plan standardization
>> SPIP? I think this is a good idea and we should do it.
>>
>> Notes:
>> We definitely need new user-facing APIs to produce these new logical
>> plans like DeleteData. But we need a design doc for these new APIs after
>> the SPIP passed.
>> We definitely need the data source to provide the ability to
>> create/drop/alter/lookup tables, but that belongs to the other SPIP and
>> should be voted separately.
>>
>> Thanks,
>> Wenchen
>>
>> On Fri, Apr 20, 2018 at 5:01 AM Ryan Blue <rb...@netflix.com.invalid>
>> wrote:
>>
>>> Hi everyone,
>>>
>>> A few weeks ago, I wrote up a proposal to standardize SQL logical plans
>>> <https://docs.google.com/document/d/1gYm5Ji2Mge3QBdOliFV5gSPTKlX4q1DCBXIkiyMv62A/edit?ts=5ace0718#heading=h.m45webtwxf2d> and
>>> a supporting design doc for data source catalog APIs
>>> <https://docs.google.com/document/d/1zLFiA1VuaWeVxeTDXNg8bL6GP3BVoOZBkewFtEnjEoo/edit#heading=h.m45webtwxf2d>.
>>> From the comments on those docs, it looks like we mostly have agreement
>>> around standardizing plans and around the data source catalog API.
>>>
>>> We still need to work out details, like the transactional API extension,
>>> but I'd like to get started implementing those proposals so we have
>>> something working for the 2.4.0 release. I'm starting this thread because I
>>> think we're about ready to vote on the proposal
>>> <https://spark.apache.org/improvement-proposals.html#discussing-an-spip>
>>> and I'd like to get any remaining discussion going or get anyone that
>>> missed this to read through the docs.
>>>
>>> Thanks!
>>>
>>> rb
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: [DISCUSS] SPIP: Standardize SQL logical plans

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Thanks! I'm all for calling a vote on the SPIP. If I understand the process
correctly, the intent is for a "shepherd" to do it. I'm happy to call a
vote, or feel free if you'd like to play that role.

Other comments:
* DeleteData API: I completely agree that we need to have a proposal for
it. I think the SQL side is easier because DELETE FROM is already a
statement. We just need to be able to identify v2 tables to use it. I'll
come up with something and send a proposal to the dev list.
* Table create/drop/alter/load API: I think we have agreement around the
proposed DataSourceV2 API, but we need to decide how the public API will
work and how this will fit in with ExternalCatalog (see the other thread
for discussion there). Do you think we need to get that entire SPIP
approved before we can start getting the API in? If so, what do you think
needs to be decided to get it ready?

Thanks!

rb

On Wed, Jul 11, 2018 at 8:24 PM Wenchen Fan <cl...@gmail.com> wrote:

> Hi Ryan,
>
> Great job on this! Shall we call a vote for the plan standardization SPIP?
> I think this is a good idea and we should do it.
>
> Notes:
> We definitely need new user-facing APIs to produce these new logical plans
> like DeleteData. But we need a design doc for these new APIs after the SPIP
> passed.
> We definitely need the data source to provide the ability to
> create/drop/alter/lookup tables, but that belongs to the other SPIP and
> should be voted separately.
>
> Thanks,
> Wenchen
>
> On Fri, Apr 20, 2018 at 5:01 AM Ryan Blue <rb...@netflix.com.invalid>
> wrote:
>
>> Hi everyone,
>>
>> A few weeks ago, I wrote up a proposal to standardize SQL logical plans
>> <https://docs.google.com/document/d/1gYm5Ji2Mge3QBdOliFV5gSPTKlX4q1DCBXIkiyMv62A/edit?ts=5ace0718#heading=h.m45webtwxf2d> and
>> a supporting design doc for data source catalog APIs
>> <https://docs.google.com/document/d/1zLFiA1VuaWeVxeTDXNg8bL6GP3BVoOZBkewFtEnjEoo/edit#heading=h.m45webtwxf2d>.
>> From the comments on those docs, it looks like we mostly have agreement
>> around standardizing plans and around the data source catalog API.
>>
>> We still need to work out details, like the transactional API extension,
>> but I'd like to get started implementing those proposals so we have
>> something working for the 2.4.0 release. I'm starting this thread because I
>> think we're about ready to vote on the proposal
>> <https://spark.apache.org/improvement-proposals.html#discussing-an-spip>
>> and I'd like to get any remaining discussion going or get anyone that
>> missed this to read through the docs.
>>
>> Thanks!
>>
>> rb
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

-- 
Ryan Blue
Software Engineer
Netflix