You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Wenchen Fan <cl...@gmail.com> on 2017/08/17 13:02:57 UTC

[VOTE] [SPIP] SPARK-15689: Data Source API V2

Hi all,

Following the SPIP process, I'm putting this SPIP up for a vote.

The current data source API doesn't work well because of some limitations
like: no partitioning/bucketing support, no columnar read, hard to support
more operator push down, etc.

I'm proposing a Data Source API V2 to address these problems, please read
the full document at
https://issues.apache.org/jira/secure/attachment/12882332/SPIP%20Data%20Source%20API%20V2.pdf

Since this SPIP is mostly about APIs, I also created a prototype and put
java docs on these interfaces, so that it's easier to review these
interfaces and discuss: https://github.com/cloud-fan/spark/pull/10/files

The vote will be up for the next 72 hours. Please reply with your vote:

+1: Yeah, let's go forward and implement the SPIP.
+0: Don't really care.
-1: I don't think this is a good idea because of the following
technical reasons.

Thanks!

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

Posted by 蒋星博 <ji...@gmail.com>.

+1 (non-binding)

Wenchen Fan <cl...@gmail.com>于2017年8月17日 周四下午9:05写道：

> adding my own +1 (binding)
>
> On Thu, Aug 17, 2017 at 9:02 PM, Wenchen Fan <cl...@gmail.com> wrote:
>
>> Hi all,
>>
>> Following the SPIP process, I'm putting this SPIP up for a vote.
>>
>> The current data source API doesn't work well because of some limitations
>> like: no partitioning/bucketing support, no columnar read, hard to support
>> more operator push down, etc.
>>
>> I'm proposing a Data Source API V2 to address these problems, please read
>> the full document at
>> https://issues.apache.org/jira/secure/attachment/12882332/SPIP%20Data%20Source%20API%20V2.pdf
>>
>> Since this SPIP is mostly about APIs, I also created a prototype and put
>> java docs on these interfaces, so that it's easier to review these
>> interfaces and discuss: https://github.com/cloud-fan/spark/pull/10/files
>>
>> The vote will be up for the next 72 hours. Please reply with your vote:
>>
>> +1: Yeah, let's go forward and implement the SPIP.
>> +0: Don't really care.
>> -1: I don't think this is a good idea because of the following
>> technical reasons.
>>
>> Thanks!
>>
>
>

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

Posted by Mark Hamstra <ma...@clearstorydata.com>.

Points 2, 3 and 4 of the Project Plan in that document (i.e. "port existing
data sources using internal APIs to use the proposed public Data Source V2
API") have my full support. Really, I'd like to see that dog-fooding effort
completed and lesson learned from it fully digested before we remove any
unstable annotations from the new API. It's okay to get a proposal out
there so that we can talk about it and start implementing and using it
internally, followed by external use under the unstable annotations, but I
don't want to see a premature vote on a final form of a new public API.

On Thu, Aug 17, 2017 at 8:55 AM, Reynold Xin <rx...@databricks.com> wrote:

> Yea I don't think it's a good idea to upload a doc and then call for a
> vote immediately. People need time to digest ...
>
>
> On Thu, Aug 17, 2017 at 6:22 AM, Wenchen Fan <cl...@gmail.com> wrote:
>
>> Sorry let's remove the VOTE tag as I just wanna bring this up for
>> discussion.
>>
>> I'll restart the voting process after we have enough discussion on the
>> JIRA ticket or here in this email thread.
>>
>> On Thu, Aug 17, 2017 at 9:12 PM, Russell Spitzer <
>> russell.spitzer@gmail.com> wrote:
>>
>>> -1, I don't think there has really been any discussion of this api
>>> change yet or at least it hasn't occurred on the jira ticket
>>>
>>> On Thu, Aug 17, 2017 at 8:05 AM Wenchen Fan <cl...@gmail.com> wrote:
>>>
>>>> adding my own +1 (binding)
>>>>
>>>> On Thu, Aug 17, 2017 at 9:02 PM, Wenchen Fan <cl...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Following the SPIP process, I'm putting this SPIP up for a vote.
>>>>>
>>>>> The current data source API doesn't work well because of some
>>>>> limitations like: no partitioning/bucketing support, no columnar read, hard
>>>>> to support more operator push down, etc.
>>>>>
>>>>> I'm proposing a Data Source API V2 to address these problems, please
>>>>> read the full document at https://issues.apache.org/jira
>>>>> /secure/attachment/12882332/SPIP%20Data%20Source%20API%20V2.pdf
>>>>>
>>>>> Since this SPIP is mostly about APIs, I also created a prototype and
>>>>> put java docs on these interfaces, so that it's easier to review these
>>>>> interfaces and discuss: https://github.com/cl
>>>>> oud-fan/spark/pull/10/files
>>>>>
>>>>> The vote will be up for the next 72 hours. Please reply with your vote:
>>>>>
>>>>> +1: Yeah, let's go forward and implement the SPIP.
>>>>> +0: Don't really care.
>>>>> -1: I don't think this is a good idea because of the following
>>>>> technical reasons.
>>>>>
>>>>> Thanks!
>>>>>
>>>>
>>>>
>>
>

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

Posted by Reynold Xin <rx...@databricks.com>.

Yea I don't think it's a good idea to upload a doc and then call for a vote
immediately. People need time to digest ...


On Thu, Aug 17, 2017 at 6:22 AM, Wenchen Fan <cl...@gmail.com> wrote:

> Sorry let's remove the VOTE tag as I just wanna bring this up for
> discussion.
>
> I'll restart the voting process after we have enough discussion on the
> JIRA ticket or here in this email thread.
>
> On Thu, Aug 17, 2017 at 9:12 PM, Russell Spitzer <
> russell.spitzer@gmail.com> wrote:
>
>> -1, I don't think there has really been any discussion of this api change
>> yet or at least it hasn't occurred on the jira ticket
>>
>> On Thu, Aug 17, 2017 at 8:05 AM Wenchen Fan <cl...@gmail.com> wrote:
>>
>>> adding my own +1 (binding)
>>>
>>> On Thu, Aug 17, 2017 at 9:02 PM, Wenchen Fan <cl...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Following the SPIP process, I'm putting this SPIP up for a vote.
>>>>
>>>> The current data source API doesn't work well because of some
>>>> limitations like: no partitioning/bucketing support, no columnar read, hard
>>>> to support more operator push down, etc.
>>>>
>>>> I'm proposing a Data Source API V2 to address these problems, please
>>>> read the full document at https://issues.apache.org/jira
>>>> /secure/attachment/12882332/SPIP%20Data%20Source%20API%20V2.pdf
>>>>
>>>> Since this SPIP is mostly about APIs, I also created a prototype and
>>>> put java docs on these interfaces, so that it's easier to review these
>>>> interfaces and discuss: https://github.com/cl
>>>> oud-fan/spark/pull/10/files
>>>>
>>>> The vote will be up for the next 72 hours. Please reply with your vote:
>>>>
>>>> +1: Yeah, let's go forward and implement the SPIP.
>>>> +0: Don't really care.
>>>> -1: I don't think this is a good idea because of the following
>>>> technical reasons.
>>>>
>>>> Thanks!
>>>>
>>>
>>>
>

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

Posted by Wenchen Fan <cl...@gmail.com>.

Sorry let's remove the VOTE tag as I just wanna bring this up for
discussion.

I'll restart the voting process after we have enough discussion on the JIRA
ticket or here in this email thread.

On Thu, Aug 17, 2017 at 9:12 PM, Russell Spitzer <ru...@gmail.com>
wrote:

> -1, I don't think there has really been any discussion of this api change
> yet or at least it hasn't occurred on the jira ticket
>
> On Thu, Aug 17, 2017 at 8:05 AM Wenchen Fan <cl...@gmail.com> wrote:
>
>> adding my own +1 (binding)
>>
>> On Thu, Aug 17, 2017 at 9:02 PM, Wenchen Fan <cl...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> Following the SPIP process, I'm putting this SPIP up for a vote.
>>>
>>> The current data source API doesn't work well because of some
>>> limitations like: no partitioning/bucketing support, no columnar read, hard
>>> to support more operator push down, etc.
>>>
>>> I'm proposing a Data Source API V2 to address these problems, please
>>> read the full document at https://issues.apache.org/
>>> jira/secure/attachment/12882332/SPIP%20Data%20Source%20API%20V2.pdf
>>>
>>> Since this SPIP is mostly about APIs, I also created a prototype and put
>>> java docs on these interfaces, so that it's easier to review these
>>> interfaces and discuss: https://github.com/cloud-fan/spark/pull/10/files
>>>
>>> The vote will be up for the next 72 hours. Please reply with your vote:
>>>
>>> +1: Yeah, let's go forward and implement the SPIP.
>>> +0: Don't really care.
>>> -1: I don't think this is a good idea because of the following
>>> technical reasons.
>>>
>>> Thanks!
>>>
>>
>>

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

Posted by Russell Spitzer <ru...@gmail.com>.

-1, I don't think there has really been any discussion of this api change
yet or at least it hasn't occurred on the jira ticket

On Thu, Aug 17, 2017 at 8:05 AM Wenchen Fan <cl...@gmail.com> wrote:

> adding my own +1 (binding)
>
> On Thu, Aug 17, 2017 at 9:02 PM, Wenchen Fan <cl...@gmail.com> wrote:
>
>> Hi all,
>>
>> Following the SPIP process, I'm putting this SPIP up for a vote.
>>
>> The current data source API doesn't work well because of some limitations
>> like: no partitioning/bucketing support, no columnar read, hard to support
>> more operator push down, etc.
>>
>> I'm proposing a Data Source API V2 to address these problems, please read
>> the full document at
>> https://issues.apache.org/jira/secure/attachment/12882332/SPIP%20Data%20Source%20API%20V2.pdf
>>
>> Since this SPIP is mostly about APIs, I also created a prototype and put
>> java docs on these interfaces, so that it's easier to review these
>> interfaces and discuss: https://github.com/cloud-fan/spark/pull/10/files
>>
>> The vote will be up for the next 72 hours. Please reply with your vote:
>>
>> +1: Yeah, let's go forward and implement the SPIP.
>> +0: Don't really care.
>> -1: I don't think this is a good idea because of the following
>> technical reasons.
>>
>> Thanks!
>>
>
>

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

Posted by Wenchen Fan <cl...@gmail.com>.

adding my own +1 (binding)

On Thu, Aug 17, 2017 at 9:02 PM, Wenchen Fan <cl...@gmail.com> wrote:

> Hi all,
>
> Following the SPIP process, I'm putting this SPIP up for a vote.
>
> The current data source API doesn't work well because of some limitations
> like: no partitioning/bucketing support, no columnar read, hard to support
> more operator push down, etc.
>
> I'm proposing a Data Source API V2 to address these problems, please read
> the full document at https://issues.apache.org/jira/secure/attachment/
> 12882332/SPIP%20Data%20Source%20API%20V2.pdf
>
> Since this SPIP is mostly about APIs, I also created a prototype and put
> java docs on these interfaces, so that it's easier to review these
> interfaces and discuss: https://github.com/cloud-fan/spark/pull/10/files
>
> The vote will be up for the next 72 hours. Please reply with your vote:
>
> +1: Yeah, let's go forward and implement the SPIP.
> +0: Don't really care.
> -1: I don't think this is a good idea because of the following
> technical reasons.
>
> Thanks!
>