You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Reynold Xin <rx...@databricks.com> on 2017/11/01 15:37:08 UTC

[Vote] SPIP: Continuous Processing Mode for Structured Streaming

Earlier I sent out a discussion thread for CP in Structured Streaming:

https://issues.apache.org/jira/browse/SPARK-20928

It is meant to be a very small, surgical change to Structured Streaming to
enable ultra-low latency. This is great timing because we are also
designing and implementing data source API v2. If designed properly, we can
have the same data source API working for both streaming and batch.


Following the SPIP process, I'm putting this SPIP up for a vote.

+1: Let's go ahead and design / implement the SPIP.
+0: Don't really care.
-1: I do not think this is a good idea for the following reasons.

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

Posted by Noman Khan <no...@live.com>.

+1
for ultra-low latency

Thanks & Regards
Noman

Get Outlook for Android<https://aka.ms/ghei36>



From: Reynold Xin
Sent: Wednesday, 1 November, 21:07
Subject: [Vote] SPIP: Continuous Processing Mode for Structured Streaming
To: dev@spark.apache.org


Earlier I sent out a discussion thread for CP in Structured Streaming:

https://issues.apache.org/jira/browse/SPARK-20928

It is meant to be a very small, surgical change to Structured Streaming to enable ultra-low latency. This is great timing because we are also designing and implementing data source API v2. If designed properly, we can have the same data source API working for both streaming and batch.


Following the SPIP process, I'm putting this SPIP up for a vote.

+1: Let's go ahead and design / implement the SPIP.
+0: Don't really care.
-1: I do not think this is a good idea for the following reasons.

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

Posted by Saisai Shao <sa...@gmail.com>.

+1, looking forward to more design details of this feature.

Thanks
Jerry

On Wed, Nov 8, 2017 at 6:40 AM, Shixiong(Ryan) Zhu <sh...@databricks.com>
wrote:

> +1
>
> On Tue, Nov 7, 2017 at 1:34 PM, Joseph Bradley <jo...@databricks.com>
> wrote:
>
>> +1
>>
>> On Mon, Nov 6, 2017 at 5:11 PM, Michael Armbrust <mi...@databricks.com>
>> wrote:
>>
>>> +1
>>>
>>> On Sat, Nov 4, 2017 at 11:02 AM, Xiao Li <ga...@gmail.com> wrote:
>>>
>>>> +1
>>>>
>>>> 2017-11-04 11:00 GMT-07:00 Burak Yavuz <br...@gmail.com>:
>>>>
>>>>> +1
>>>>>
>>>>> On Fri, Nov 3, 2017 at 10:02 PM, vaquar khan <va...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> +1
>>>>>>
>>>>>> On Fri, Nov 3, 2017 at 8:14 PM, Weichen Xu <weichen.xu@databricks.com
>>>>>> > wrote:
>>>>>>
>>>>>>> +1.
>>>>>>>
>>>>>>> On Sat, Nov 4, 2017 at 8:04 AM, Matei Zaharia <
>>>>>>> matei.zaharia@gmail.com> wrote:
>>>>>>>
>>>>>>>> +1 from me too.
>>>>>>>>
>>>>>>>> Matei
>>>>>>>>
>>>>>>>> > On Nov 3, 2017, at 4:59 PM, Wenchen Fan <cl...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> > +1.
>>>>>>>> >
>>>>>>>> > I think this architecture makes a lot of sense to let executors
>>>>>>>> talk to source/sink directly, and bring very low latency.
>>>>>>>> >
>>>>>>>> > On Thu, Nov 2, 2017 at 9:01 AM, Sean Owen <so...@cloudera.com>
>>>>>>>> wrote:
>>>>>>>> > +0 simply because I don't feel I know enough to have an opinion.
>>>>>>>> I have no reason to doubt the change though, from a skim through the doc.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Wed, Nov 1, 2017 at 3:37 PM Reynold Xin <rx...@databricks.com>
>>>>>>>> wrote:
>>>>>>>> > Earlier I sent out a discussion thread for CP in Structured
>>>>>>>> Streaming:
>>>>>>>> >
>>>>>>>> > https://issues.apache.org/jira/browse/SPARK-20928
>>>>>>>> >
>>>>>>>> > It is meant to be a very small, surgical change to Structured
>>>>>>>> Streaming to enable ultra-low latency. This is great timing because we are
>>>>>>>> also designing and implementing data source API v2. If designed properly,
>>>>>>>> we can have the same data source API working for both streaming and batch.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Following the SPIP process, I'm putting this SPIP up for a vote.
>>>>>>>> >
>>>>>>>> > +1: Let's go ahead and design / implement the SPIP.
>>>>>>>> > +0: Don't really care.
>>>>>>>> > -1: I do not think this is a good idea for the following reasons.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ---------
>>>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Vaquar Khan
>>>>>> +1 -224-436-0783 <(224)%20436-0783>
>>>>>> Greater Chicago
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>>
>> Joseph Bradley
>>
>> Software Engineer - Machine Learning
>>
>> Databricks, Inc.
>>
>> [image: http://databricks.com] <http://databricks.com/>
>>
>
>

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

Posted by "Shixiong(Ryan) Zhu" <sh...@databricks.com>.

+1

On Tue, Nov 7, 2017 at 1:34 PM, Joseph Bradley <jo...@databricks.com>
wrote:

> +1
>
> On Mon, Nov 6, 2017 at 5:11 PM, Michael Armbrust <mi...@databricks.com>
> wrote:
>
>> +1
>>
>> On Sat, Nov 4, 2017 at 11:02 AM, Xiao Li <ga...@gmail.com> wrote:
>>
>>> +1
>>>
>>> 2017-11-04 11:00 GMT-07:00 Burak Yavuz <br...@gmail.com>:
>>>
>>>> +1
>>>>
>>>> On Fri, Nov 3, 2017 at 10:02 PM, vaquar khan <va...@gmail.com>
>>>> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> On Fri, Nov 3, 2017 at 8:14 PM, Weichen Xu <we...@databricks.com>
>>>>> wrote:
>>>>>
>>>>>> +1.
>>>>>>
>>>>>> On Sat, Nov 4, 2017 at 8:04 AM, Matei Zaharia <
>>>>>> matei.zaharia@gmail.com> wrote:
>>>>>>
>>>>>>> +1 from me too.
>>>>>>>
>>>>>>> Matei
>>>>>>>
>>>>>>> > On Nov 3, 2017, at 4:59 PM, Wenchen Fan <cl...@gmail.com>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > +1.
>>>>>>> >
>>>>>>> > I think this architecture makes a lot of sense to let executors
>>>>>>> talk to source/sink directly, and bring very low latency.
>>>>>>> >
>>>>>>> > On Thu, Nov 2, 2017 at 9:01 AM, Sean Owen <so...@cloudera.com>
>>>>>>> wrote:
>>>>>>> > +0 simply because I don't feel I know enough to have an opinion. I
>>>>>>> have no reason to doubt the change though, from a skim through the doc.
>>>>>>> >
>>>>>>> >
>>>>>>> > On Wed, Nov 1, 2017 at 3:37 PM Reynold Xin <rx...@databricks.com>
>>>>>>> wrote:
>>>>>>> > Earlier I sent out a discussion thread for CP in Structured
>>>>>>> Streaming:
>>>>>>> >
>>>>>>> > https://issues.apache.org/jira/browse/SPARK-20928
>>>>>>> >
>>>>>>> > It is meant to be a very small, surgical change to Structured
>>>>>>> Streaming to enable ultra-low latency. This is great timing because we are
>>>>>>> also designing and implementing data source API v2. If designed properly,
>>>>>>> we can have the same data source API working for both streaming and batch.
>>>>>>> >
>>>>>>> >
>>>>>>> > Following the SPIP process, I'm putting this SPIP up for a vote.
>>>>>>> >
>>>>>>> > +1: Let's go ahead and design / implement the SPIP.
>>>>>>> > +0: Don't really care.
>>>>>>> > -1: I do not think this is a good idea for the following reasons.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------
>>>>>>> ---------
>>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Vaquar Khan
>>>>> +1 -224-436-0783 <(224)%20436-0783>
>>>>> Greater Chicago
>>>>>
>>>>
>>>>
>>>
>>
>
>
> --
>
> Joseph Bradley
>
> Software Engineer - Machine Learning
>
> Databricks, Inc.
>
> [image: http://databricks.com] <http://databricks.com/>
>

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

Posted by Joseph Bradley <jo...@databricks.com>.

+1

On Mon, Nov 6, 2017 at 5:11 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> +1
>
> On Sat, Nov 4, 2017 at 11:02 AM, Xiao Li <ga...@gmail.com> wrote:
>
>> +1
>>
>> 2017-11-04 11:00 GMT-07:00 Burak Yavuz <br...@gmail.com>:
>>
>>> +1
>>>
>>> On Fri, Nov 3, 2017 at 10:02 PM, vaquar khan <va...@gmail.com>
>>> wrote:
>>>
>>>> +1
>>>>
>>>> On Fri, Nov 3, 2017 at 8:14 PM, Weichen Xu <we...@databricks.com>
>>>> wrote:
>>>>
>>>>> +1.
>>>>>
>>>>> On Sat, Nov 4, 2017 at 8:04 AM, Matei Zaharia <matei.zaharia@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> +1 from me too.
>>>>>>
>>>>>> Matei
>>>>>>
>>>>>> > On Nov 3, 2017, at 4:59 PM, Wenchen Fan <cl...@gmail.com>
>>>>>> wrote:
>>>>>> >
>>>>>> > +1.
>>>>>> >
>>>>>> > I think this architecture makes a lot of sense to let executors
>>>>>> talk to source/sink directly, and bring very low latency.
>>>>>> >
>>>>>> > On Thu, Nov 2, 2017 at 9:01 AM, Sean Owen <so...@cloudera.com>
>>>>>> wrote:
>>>>>> > +0 simply because I don't feel I know enough to have an opinion. I
>>>>>> have no reason to doubt the change though, from a skim through the doc.
>>>>>> >
>>>>>> >
>>>>>> > On Wed, Nov 1, 2017 at 3:37 PM Reynold Xin <rx...@databricks.com>
>>>>>> wrote:
>>>>>> > Earlier I sent out a discussion thread for CP in Structured
>>>>>> Streaming:
>>>>>> >
>>>>>> > https://issues.apache.org/jira/browse/SPARK-20928
>>>>>> >
>>>>>> > It is meant to be a very small, surgical change to Structured
>>>>>> Streaming to enable ultra-low latency. This is great timing because we are
>>>>>> also designing and implementing data source API v2. If designed properly,
>>>>>> we can have the same data source API working for both streaming and batch.
>>>>>> >
>>>>>> >
>>>>>> > Following the SPIP process, I'm putting this SPIP up for a vote.
>>>>>> >
>>>>>> > +1: Let's go ahead and design / implement the SPIP.
>>>>>> > +0: Don't really care.
>>>>>> > -1: I do not think this is a good idea for the following reasons.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Vaquar Khan
>>>> +1 -224-436-0783 <(224)%20436-0783>
>>>> Greater Chicago
>>>>
>>>
>>>
>>
>


-- 

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

[image: http://databricks.com] <http://databricks.com/>

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

Posted by Michael Armbrust <mi...@databricks.com>.

+1

On Sat, Nov 4, 2017 at 11:02 AM, Xiao Li <ga...@gmail.com> wrote:

> +1
>
> 2017-11-04 11:00 GMT-07:00 Burak Yavuz <br...@gmail.com>:
>
>> +1
>>
>> On Fri, Nov 3, 2017 at 10:02 PM, vaquar khan <va...@gmail.com>
>> wrote:
>>
>>> +1
>>>
>>> On Fri, Nov 3, 2017 at 8:14 PM, Weichen Xu <we...@databricks.com>
>>> wrote:
>>>
>>>> +1.
>>>>
>>>> On Sat, Nov 4, 2017 at 8:04 AM, Matei Zaharia <ma...@gmail.com>
>>>> wrote:
>>>>
>>>>> +1 from me too.
>>>>>
>>>>> Matei
>>>>>
>>>>> > On Nov 3, 2017, at 4:59 PM, Wenchen Fan <cl...@gmail.com> wrote:
>>>>> >
>>>>> > +1.
>>>>> >
>>>>> > I think this architecture makes a lot of sense to let executors talk
>>>>> to source/sink directly, and bring very low latency.
>>>>> >
>>>>> > On Thu, Nov 2, 2017 at 9:01 AM, Sean Owen <so...@cloudera.com>
>>>>> wrote:
>>>>> > +0 simply because I don't feel I know enough to have an opinion. I
>>>>> have no reason to doubt the change though, from a skim through the doc.
>>>>> >
>>>>> >
>>>>> > On Wed, Nov 1, 2017 at 3:37 PM Reynold Xin <rx...@databricks.com>
>>>>> wrote:
>>>>> > Earlier I sent out a discussion thread for CP in Structured
>>>>> Streaming:
>>>>> >
>>>>> > https://issues.apache.org/jira/browse/SPARK-20928
>>>>> >
>>>>> > It is meant to be a very small, surgical change to Structured
>>>>> Streaming to enable ultra-low latency. This is great timing because we are
>>>>> also designing and implementing data source API v2. If designed properly,
>>>>> we can have the same data source API working for both streaming and batch.
>>>>> >
>>>>> >
>>>>> > Following the SPIP process, I'm putting this SPIP up for a vote.
>>>>> >
>>>>> > +1: Let's go ahead and design / implement the SPIP.
>>>>> > +0: Don't really care.
>>>>> > -1: I do not think this is a good idea for the following reasons.
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Vaquar Khan
>>> +1 -224-436-0783 <(224)%20436-0783>
>>> Greater Chicago
>>>
>>
>>
>

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

Posted by Xiao Li <ga...@gmail.com>.

+1

2017-11-04 11:00 GMT-07:00 Burak Yavuz <br...@gmail.com>:

> +1
>
> On Fri, Nov 3, 2017 at 10:02 PM, vaquar khan <va...@gmail.com>
> wrote:
>
>> +1
>>
>> On Fri, Nov 3, 2017 at 8:14 PM, Weichen Xu <we...@databricks.com>
>> wrote:
>>
>>> +1.
>>>
>>> On Sat, Nov 4, 2017 at 8:04 AM, Matei Zaharia <ma...@gmail.com>
>>> wrote:
>>>
>>>> +1 from me too.
>>>>
>>>> Matei
>>>>
>>>> > On Nov 3, 2017, at 4:59 PM, Wenchen Fan <cl...@gmail.com> wrote:
>>>> >
>>>> > +1.
>>>> >
>>>> > I think this architecture makes a lot of sense to let executors talk
>>>> to source/sink directly, and bring very low latency.
>>>> >
>>>> > On Thu, Nov 2, 2017 at 9:01 AM, Sean Owen <so...@cloudera.com> wrote:
>>>> > +0 simply because I don't feel I know enough to have an opinion. I
>>>> have no reason to doubt the change though, from a skim through the doc.
>>>> >
>>>> >
>>>> > On Wed, Nov 1, 2017 at 3:37 PM Reynold Xin <rx...@databricks.com>
>>>> wrote:
>>>> > Earlier I sent out a discussion thread for CP in Structured Streaming:
>>>> >
>>>> > https://issues.apache.org/jira/browse/SPARK-20928
>>>> >
>>>> > It is meant to be a very small, surgical change to Structured
>>>> Streaming to enable ultra-low latency. This is great timing because we are
>>>> also designing and implementing data source API v2. If designed properly,
>>>> we can have the same data source API working for both streaming and batch.
>>>> >
>>>> >
>>>> > Following the SPIP process, I'm putting this SPIP up for a vote.
>>>> >
>>>> > +1: Let's go ahead and design / implement the SPIP.
>>>> > +0: Don't really care.
>>>> > -1: I do not think this is a good idea for the following reasons.
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>
>>>>
>>>
>>
>>
>> --
>> Regards,
>> Vaquar Khan
>> +1 -224-436-0783 <(224)%20436-0783>
>> Greater Chicago
>>
>
>

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

Posted by Burak Yavuz <br...@gmail.com>.

+1

On Fri, Nov 3, 2017 at 10:02 PM, vaquar khan <va...@gmail.com> wrote:

> +1
>
> On Fri, Nov 3, 2017 at 8:14 PM, Weichen Xu <we...@databricks.com>
> wrote:
>
>> +1.
>>
>> On Sat, Nov 4, 2017 at 8:04 AM, Matei Zaharia <ma...@gmail.com>
>> wrote:
>>
>>> +1 from me too.
>>>
>>> Matei
>>>
>>> > On Nov 3, 2017, at 4:59 PM, Wenchen Fan <cl...@gmail.com> wrote:
>>> >
>>> > +1.
>>> >
>>> > I think this architecture makes a lot of sense to let executors talk
>>> to source/sink directly, and bring very low latency.
>>> >
>>> > On Thu, Nov 2, 2017 at 9:01 AM, Sean Owen <so...@cloudera.com> wrote:
>>> > +0 simply because I don't feel I know enough to have an opinion. I
>>> have no reason to doubt the change though, from a skim through the doc.
>>> >
>>> >
>>> > On Wed, Nov 1, 2017 at 3:37 PM Reynold Xin <rx...@databricks.com>
>>> wrote:
>>> > Earlier I sent out a discussion thread for CP in Structured Streaming:
>>> >
>>> > https://issues.apache.org/jira/browse/SPARK-20928
>>> >
>>> > It is meant to be a very small, surgical change to Structured
>>> Streaming to enable ultra-low latency. This is great timing because we are
>>> also designing and implementing data source API v2. If designed properly,
>>> we can have the same data source API working for both streaming and batch.
>>> >
>>> >
>>> > Following the SPIP process, I'm putting this SPIP up for a vote.
>>> >
>>> > +1: Let's go ahead and design / implement the SPIP.
>>> > +0: Don't really care.
>>> > -1: I do not think this is a good idea for the following reasons.
>>> >
>>> >
>>> >
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>
>>>
>>
>
>
> --
> Regards,
> Vaquar Khan
> +1 -224-436-0783 <(224)%20436-0783>
> Greater Chicago
>

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

Posted by vaquar khan <va...@gmail.com>.

+1

On Fri, Nov 3, 2017 at 8:14 PM, Weichen Xu <we...@databricks.com>
wrote:

> +1.
>
> On Sat, Nov 4, 2017 at 8:04 AM, Matei Zaharia <ma...@gmail.com>
> wrote:
>
>> +1 from me too.
>>
>> Matei
>>
>> > On Nov 3, 2017, at 4:59 PM, Wenchen Fan <cl...@gmail.com> wrote:
>> >
>> > +1.
>> >
>> > I think this architecture makes a lot of sense to let executors talk to
>> source/sink directly, and bring very low latency.
>> >
>> > On Thu, Nov 2, 2017 at 9:01 AM, Sean Owen <so...@cloudera.com> wrote:
>> > +0 simply because I don't feel I know enough to have an opinion. I have
>> no reason to doubt the change though, from a skim through the doc.
>> >
>> >
>> > On Wed, Nov 1, 2017 at 3:37 PM Reynold Xin <rx...@databricks.com> wrote:
>> > Earlier I sent out a discussion thread for CP in Structured Streaming:
>> >
>> > https://issues.apache.org/jira/browse/SPARK-20928
>> >
>> > It is meant to be a very small, surgical change to Structured Streaming
>> to enable ultra-low latency. This is great timing because we are also
>> designing and implementing data source API v2. If designed properly, we can
>> have the same data source API working for both streaming and batch.
>> >
>> >
>> > Following the SPIP process, I'm putting this SPIP up for a vote.
>> >
>> > +1: Let's go ahead and design / implement the SPIP.
>> > +0: Don't really care.
>> > -1: I do not think this is a good idea for the following reasons.
>> >
>> >
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>
>


-- 
Regards,
Vaquar Khan
+1 -224-436-0783
Greater Chicago

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

Posted by Weichen Xu <we...@databricks.com>.

+1.

On Sat, Nov 4, 2017 at 8:04 AM, Matei Zaharia <ma...@gmail.com>
wrote:

> +1 from me too.
>
> Matei
>
> > On Nov 3, 2017, at 4:59 PM, Wenchen Fan <cl...@gmail.com> wrote:
> >
> > +1.
> >
> > I think this architecture makes a lot of sense to let executors talk to
> source/sink directly, and bring very low latency.
> >
> > On Thu, Nov 2, 2017 at 9:01 AM, Sean Owen <so...@cloudera.com> wrote:
> > +0 simply because I don't feel I know enough to have an opinion. I have
> no reason to doubt the change though, from a skim through the doc.
> >
> >
> > On Wed, Nov 1, 2017 at 3:37 PM Reynold Xin <rx...@databricks.com> wrote:
> > Earlier I sent out a discussion thread for CP in Structured Streaming:
> >
> > https://issues.apache.org/jira/browse/SPARK-20928
> >
> > It is meant to be a very small, surgical change to Structured Streaming
> to enable ultra-low latency. This is great timing because we are also
> designing and implementing data source API v2. If designed properly, we can
> have the same data source API working for both streaming and batch.
> >
> >
> > Following the SPIP process, I'm putting this SPIP up for a vote.
> >
> > +1: Let's go ahead and design / implement the SPIP.
> > +0: Don't really care.
> > -1: I do not think this is a good idea for the following reasons.
> >
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

Posted by Matei Zaharia <ma...@gmail.com>.

+1 from me too.

Matei

> On Nov 3, 2017, at 4:59 PM, Wenchen Fan <cl...@gmail.com> wrote:
> 
> +1.
> 
> I think this architecture makes a lot of sense to let executors talk to source/sink directly, and bring very low latency.
> 
> On Thu, Nov 2, 2017 at 9:01 AM, Sean Owen <so...@cloudera.com> wrote:
> +0 simply because I don't feel I know enough to have an opinion. I have no reason to doubt the change though, from a skim through the doc.
> 
> 
> On Wed, Nov 1, 2017 at 3:37 PM Reynold Xin <rx...@databricks.com> wrote:
> Earlier I sent out a discussion thread for CP in Structured Streaming:
> 
> https://issues.apache.org/jira/browse/SPARK-20928
> 
> It is meant to be a very small, surgical change to Structured Streaming to enable ultra-low latency. This is great timing because we are also designing and implementing data source API v2. If designed properly, we can have the same data source API working for both streaming and batch.
> 
> 
> Following the SPIP process, I'm putting this SPIP up for a vote.
> 
> +1: Let's go ahead and design / implement the SPIP.
> +0: Don't really care.
> -1: I do not think this is a good idea for the following reasons.
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

Posted by Wenchen Fan <cl...@gmail.com>.

+1.

I think this architecture makes a lot of sense to let executors talk to
source/sink directly, and bring very low latency.

On Thu, Nov 2, 2017 at 9:01 AM, Sean Owen <so...@cloudera.com> wrote:

> +0 simply because I don't feel I know enough to have an opinion. I have no
> reason to doubt the change though, from a skim through the doc.
>
>
> On Wed, Nov 1, 2017 at 3:37 PM Reynold Xin <rx...@databricks.com> wrote:
>
>> Earlier I sent out a discussion thread for CP in Structured Streaming:
>>
>> https://issues.apache.org/jira/browse/SPARK-20928
>>
>> It is meant to be a very small, surgical change to Structured Streaming
>> to enable ultra-low latency. This is great timing because we are also
>> designing and implementing data source API v2. If designed properly, we can
>> have the same data source API working for both streaming and batch.
>>
>>
>> Following the SPIP process, I'm putting this SPIP up for a vote.
>>
>> +1: Let's go ahead and design / implement the SPIP.
>> +0: Don't really care.
>> -1: I do not think this is a good idea for the following reasons.
>>
>>
>>

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

Posted by Sean Owen <so...@cloudera.com>.

+0 simply because I don't feel I know enough to have an opinion. I have no
reason to doubt the change though, from a skim through the doc.

On Wed, Nov 1, 2017 at 3:37 PM Reynold Xin <rx...@databricks.com> wrote:

> Earlier I sent out a discussion thread for CP in Structured Streaming:
>
> https://issues.apache.org/jira/browse/SPARK-20928
>
> It is meant to be a very small, surgical change to Structured Streaming to
> enable ultra-low latency. This is great timing because we are also
> designing and implementing data source API v2. If designed properly, we can
> have the same data source API working for both streaming and batch.
>
>
> Following the SPIP process, I'm putting this SPIP up for a vote.
>
> +1: Let's go ahead and design / implement the SPIP.
> +0: Don't really care.
> -1: I do not think this is a good idea for the following reasons.
>
>
>

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

Posted by Reynold Xin <rx...@databricks.com>.

Thanks Tom. I'd imagine more details belong either in a full design doc, or
a PR description. Might make sense to do an additional design doc, if there
is enough delta from the current sketch doc.


On Mon, Nov 6, 2017 at 7:29 AM, Tom Graves <tg...@yahoo.com> wrote:

> +1 for the idea and feature, but I think the design is definitely lacking
> detail on the internal changes needed and how the execution pieces work and
> the communication.  Are you planning on posting more of those details or
> were you just planning on discussing in PR?
>
> Tom
>
> On Wednesday, November 1, 2017, 11:29:21 AM CDT, Debasish Das <
> debasish.das83@gmail.com> wrote:
>
>
> +1
>
> Is there any design doc related to API/internal changes ? Will CP be the
> default in structured streaming or it's a mode in conjunction with
> exisiting behavior.
>
> Thanks.
> Deb
>
> On Nov 1, 2017 8:37 AM, "Reynold Xin" <rx...@databricks.com> wrote:
>
> Earlier I sent out a discussion thread for CP in Structured Streaming:
>
> https://issues.apache.org/ jira/browse/SPARK-20928
> <https://issues.apache.org/jira/browse/SPARK-20928>
>
> It is meant to be a very small, surgical change to Structured Streaming to
> enable ultra-low latency. This is great timing because we are also
> designing and implementing data source API v2. If designed properly, we can
> have the same data source API working for both streaming and batch.
>
>
> Following the SPIP process, I'm putting this SPIP up for a vote.
>
> +1: Let's go ahead and design / implement the SPIP.
> +0: Don't really care.
> -1: I do not think this is a good idea for the following reasons.
>
>
>
>

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

Posted by Tom Graves <tg...@yahoo.com.INVALID>.

 +1 for the idea and feature, but I think the design is definitely lacking detail on the internal changes needed and how the execution pieces work and the communication.  Are you planning on posting more of those details or were you just planning on discussing in PR?
Tom
    On Wednesday, November 1, 2017, 11:29:21 AM CDT, Debasish Das <de...@gmail.com> wrote:  
 
 +1
Is there any design doc related to API/internal changes ? Will CP be the default in structured streaming or it's a mode in conjunction with exisiting behavior.
Thanks.Deb
On Nov 1, 2017 8:37 AM, "Reynold Xin" <rx...@databricks.com> wrote:

Earlier I sent out a discussion thread for CP in Structured Streaming:
https://issues.apache.org/ jira/browse/SPARK-20928
It is meant to be a very small, surgical change to Structured Streaming to enable ultra-low latency. This is great timing because we are also designing and implementing data source API v2. If designed properly, we can have the same data source API working for both streaming and batch.

Following the SPIP process, I'm putting this SPIP up for a vote.
+1: Let's go ahead and design / implement the SPIP.+0: Don't really care.-1: I do not think this is a good idea for the following reasons.

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

Posted by Reynold Xin <rx...@databricks.com>.

I just replied.


On Wed, Nov 1, 2017 at 5:50 PM, Cody Koeninger <co...@koeninger.org> wrote:

> Was there any answer to my question around the effect of changes to
> the sink api regarding access to underlying offsets?
>
> On Wed, Nov 1, 2017 at 11:32 AM, Reynold Xin <rx...@databricks.com> wrote:
> > Most of those should be answered by the attached design sketch in the
> JIRA
> > ticket.
> >
> > On Wed, Nov 1, 2017 at 5:29 PM Debasish Das <de...@gmail.com>
> > wrote:
> >>
> >> +1
> >>
> >> Is there any design doc related to API/internal changes ? Will CP be the
> >> default in structured streaming or it's a mode in conjunction with
> exisiting
> >> behavior.
> >>
> >> Thanks.
> >> Deb
> >>
> >> On Nov 1, 2017 8:37 AM, "Reynold Xin" <rx...@databricks.com> wrote:
> >>
> >> Earlier I sent out a discussion thread for CP in Structured Streaming:
> >>
> >> https://issues.apache.org/jira/browse/SPARK-20928
> >>
> >> It is meant to be a very small, surgical change to Structured Streaming
> to
> >> enable ultra-low latency. This is great timing because we are also
> designing
> >> and implementing data source API v2. If designed properly, we can have
> the
> >> same data source API working for both streaming and batch.
> >>
> >>
> >> Following the SPIP process, I'm putting this SPIP up for a vote.
> >>
> >> +1: Let's go ahead and design / implement the SPIP.
> >> +0: Don't really care.
> >> -1: I do not think this is a good idea for the following reasons.
> >>
> >>
> >>
> >
>

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

Posted by Cody Koeninger <co...@koeninger.org>.

Was there any answer to my question around the effect of changes to
the sink api regarding access to underlying offsets?

On Wed, Nov 1, 2017 at 11:32 AM, Reynold Xin <rx...@databricks.com> wrote:
> Most of those should be answered by the attached design sketch in the JIRA
> ticket.
>
> On Wed, Nov 1, 2017 at 5:29 PM Debasish Das <de...@gmail.com>
> wrote:
>>
>> +1
>>
>> Is there any design doc related to API/internal changes ? Will CP be the
>> default in structured streaming or it's a mode in conjunction with exisiting
>> behavior.
>>
>> Thanks.
>> Deb
>>
>> On Nov 1, 2017 8:37 AM, "Reynold Xin" <rx...@databricks.com> wrote:
>>
>> Earlier I sent out a discussion thread for CP in Structured Streaming:
>>
>> https://issues.apache.org/jira/browse/SPARK-20928
>>
>> It is meant to be a very small, surgical change to Structured Streaming to
>> enable ultra-low latency. This is great timing because we are also designing
>> and implementing data source API v2. If designed properly, we can have the
>> same data source API working for both streaming and batch.
>>
>>
>> Following the SPIP process, I'm putting this SPIP up for a vote.
>>
>> +1: Let's go ahead and design / implement the SPIP.
>> +0: Don't really care.
>> -1: I do not think this is a good idea for the following reasons.
>>
>>
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

Posted by Reynold Xin <rx...@databricks.com>.

Most of those should be answered by the attached design sketch in the JIRA
ticket.

On Wed, Nov 1, 2017 at 5:29 PM Debasish Das <de...@gmail.com>
wrote:

> +1
>
> Is there any design doc related to API/internal changes ? Will CP be the
> default in structured streaming or it's a mode in conjunction with
> exisiting behavior.
>
> Thanks.
> Deb
>
> On Nov 1, 2017 8:37 AM, "Reynold Xin" <rx...@databricks.com> wrote:
>
> Earlier I sent out a discussion thread for CP in Structured Streaming:
>
> https://issues.apache.org/jira/browse/SPARK-20928
>
> It is meant to be a very small, surgical change to Structured Streaming to
> enable ultra-low latency. This is great timing because we are also
> designing and implementing data source API v2. If designed properly, we can
> have the same data source API working for both streaming and batch.
>
>
> Following the SPIP process, I'm putting this SPIP up for a vote.
>
> +1: Let's go ahead and design / implement the SPIP.
> +0: Don't really care.
> -1: I do not think this is a good idea for the following reasons.
>
>
>
>

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

Posted by Debasish Das <de...@gmail.com>.

+1

Is there any design doc related to API/internal changes ? Will CP be the
default in structured streaming or it's a mode in conjunction with
exisiting behavior.

Thanks.
Deb

On Nov 1, 2017 8:37 AM, "Reynold Xin" <rx...@databricks.com> wrote:

Earlier I sent out a discussion thread for CP in Structured Streaming:

https://issues.apache.org/jira/browse/SPARK-20928

It is meant to be a very small, surgical change to Structured Streaming to
enable ultra-low latency. This is great timing because we are also
designing and implementing data source API v2. If designed properly, we can
have the same data source API working for both streaming and batch.


Following the SPIP process, I'm putting this SPIP up for a vote.

+1: Let's go ahead and design / implement the SPIP.
+0: Don't really care.
-1: I do not think this is a good idea for the following reasons.

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

Posted by Reynold Xin <rx...@databricks.com>.

The vote has passed with the following +1s:

Reynold Xin*
Debasish Das
Noman Khan
Wenchen Fan*
Matei Zaharia*
Weichen Xu
Vaquar Khan
Burak Yavuz
Xiao Li
Tom Graves*
Michael Armbrust*
Joseph Bradley*
Shixiong Zhu*


And the following +0s:

Sean Owen*


Thanks for the feedback!


On Wed, Nov 1, 2017 at 8:37 AM, Reynold Xin <rx...@databricks.com> wrote:

> Earlier I sent out a discussion thread for CP in Structured Streaming:
>
> https://issues.apache.org/jira/browse/SPARK-20928
>
> It is meant to be a very small, surgical change to Structured Streaming to
> enable ultra-low latency. This is great timing because we are also
> designing and implementing data source API v2. If designed properly, we can
> have the same data source API working for both streaming and batch.
>
>
> Following the SPIP process, I'm putting this SPIP up for a vote.
>
> +1: Let's go ahead and design / implement the SPIP.
> +0: Don't really care.
> -1: I do not think this is a good idea for the following reasons.
>
>
>