You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Bharath Kumara Subramanian <co...@gmail.com> on 2017/09/06 17:45:15 UTC

[VOTE] SEP-8: Add in-memory system consumer & producer

Hi all,

Can you please vote for SEP-8?
You can find the design document here
<https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=71013043>.

Thanks,
Bharath

Re: [VOTE] SEP-8: Add in-memory system consumer & producer

Posted by Yi Pan <ni...@gmail.com>.
+1. The updated design looks good to me! Looking forward to its
implementation.

Minor: if this SEP has direct dependency on SEP-2 and other
ApplicationRunner related API changes, please make it clear (i.e. what we
can implement w/o being blocked and what need to wait till the API changes
are committed).

-Yi

On Tue, Sep 26, 2017 at 10:58 AM, Jagadish Venkatraman <
jagadish1989@gmail.com> wrote:

> LGTM , +1 on the overall design. This will drastically improve testing of
> Samza applications!
>
> --
> Jagdish
>
> On Thu, Sep 14, 2017 at 1:23 AM, Yi Pan <ni...@gmail.com> wrote:
>
> > Hi, Bharath,
> >
> > Overall looks good! I have the following comments:
> >
> > i) Question on the Type of IME + data partition:
> >
> > How do we enforce that user adds IME w/ the expected partition id to the
> > corresponding sub-collection?
> >
> >
> >
> > ii) In the architecture graph, what's the difference between SSP queues
> and
> > Data source/sink? What is the layer exposed to the user (I.e.
> programmer)?
> >
> >
> >
> > ii) Agree w/ the approach to use a customized queues managed by the
> admin.
> > However, the reason not to use BEM is not very clear. For the matter of
> > fact, BEM is just one optional base class for SystemConsumer
> > implementation.
> > Not sure why we necessarily need to be limited by BEM.
> >
> >
> >
> > iii) In the code examples,
> >
> > A) what's the difference between durable state vs non-durable state in
> > highlevel API? I don't see any difference. Also, the SEP has clearly
> > described that the design is only for InMemory input/output/intermediate
> > streams. I noticed that you added changelog as inputs in low-level API.
> But
> > it is not clear how this changelog is defined and why it is an input to
> the
> > application???
> >
> > B) the code example for checkpoint is empty and we have stated that we
> > won't support checkpoint in this SEP. Can we remove it?
> >
> >
> > Thanks!
> >
> >
> > -Yi
> >
> > On Wed, Sep 6, 2017 at 2:06 PM, xinyu liu <xi...@gmail.com> wrote:
> >
> > > +1 on the overall design. This will make testing a lot easier!
> > >
> > > Thanks,
> > > Xinyu
> > >
> > > On Wed, Sep 6, 2017 at 10:45 AM, Bharath Kumara Subramanian <
> > > codin.martial@gmail.com> wrote:
> > >
> > > > Hi all,
> > > >
> > > > Can you please vote for SEP-8?
> > > > You can find the design document here
> > > > <https://cwiki.apache.org/confluence/pages/viewpage.
> > > action?pageId=71013043
> > > > >.
> > > >
> > > > Thanks,
> > > > Bharath
> > > >
> > >
> >
>
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>

Re: [VOTE] SEP-8: Add in-memory system consumer & producer

Posted by Jagadish Venkatraman <ja...@gmail.com>.
LGTM , +1 on the overall design. This will drastically improve testing of
Samza applications!

--
Jagdish

On Thu, Sep 14, 2017 at 1:23 AM, Yi Pan <ni...@gmail.com> wrote:

> Hi, Bharath,
>
> Overall looks good! I have the following comments:
>
> i) Question on the Type of IME + data partition:
>
> How do we enforce that user adds IME w/ the expected partition id to the
> corresponding sub-collection?
>
>
>
> ii) In the architecture graph, what's the difference between SSP queues and
> Data source/sink? What is the layer exposed to the user (I.e. programmer)?
>
>
>
> ii) Agree w/ the approach to use a customized queues managed by the admin.
> However, the reason not to use BEM is not very clear. For the matter of
> fact, BEM is just one optional base class for SystemConsumer
> implementation.
> Not sure why we necessarily need to be limited by BEM.
>
>
>
> iii) In the code examples,
>
> A) what's the difference between durable state vs non-durable state in
> highlevel API? I don't see any difference. Also, the SEP has clearly
> described that the design is only for InMemory input/output/intermediate
> streams. I noticed that you added changelog as inputs in low-level API. But
> it is not clear how this changelog is defined and why it is an input to the
> application???
>
> B) the code example for checkpoint is empty and we have stated that we
> won't support checkpoint in this SEP. Can we remove it?
>
>
> Thanks!
>
>
> -Yi
>
> On Wed, Sep 6, 2017 at 2:06 PM, xinyu liu <xi...@gmail.com> wrote:
>
> > +1 on the overall design. This will make testing a lot easier!
> >
> > Thanks,
> > Xinyu
> >
> > On Wed, Sep 6, 2017 at 10:45 AM, Bharath Kumara Subramanian <
> > codin.martial@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > Can you please vote for SEP-8?
> > > You can find the design document here
> > > <https://cwiki.apache.org/confluence/pages/viewpage.
> > action?pageId=71013043
> > > >.
> > >
> > > Thanks,
> > > Bharath
> > >
> >
>



-- 
Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University

Re: [VOTE] SEP-8: Add in-memory system consumer & producer

Posted by Bharath Kumarasubramanian <bk...@linkedin.com>.
Thanks for your feedback. Answers inline


On 9/14/17, 1:23 AM, "Yi Pan" <ni...@gmail.com> wrote:

    Hi, Bharath,
    
    Overall looks good! I have the following comments:
    
    i) Question on the Type of IME + data partition:
    
    How do we enforce that user adds IME w/ the expected partition id to the
    corresponding sub-collection?
    For IME as the data source, we will take a collection instead of collection of collection since we know the partition information already.
    I will update the wiki to make it more clear and explicit. Let me know if this is acceptable? 
    
    
    ii) In the architecture graph, what's the difference between SSP queues and
    Data source/sink? What is the layer exposed to the user (I.e. programmer)?
    SSP queues are intermediate buffers for the in-memory system to pass messages and are not exposed to programmer.
    Data source/sink refers to the handle of input data provided by the end user and output to which the system will flush the data for end user to access.
    
    
    ii) Agree w/ the approach to use a customized queues managed by the admin.
    However, the reason not to use BEM is not very clear. For the matter of
    fact, BEM is just one optional base class for SystemConsumer implementation.
    Not sure why we necessarily need to be limited by BEM.
    I agree BEM is just an optional helper class that has bunch of utility methods to implement a SystemConsumer. Having to go down the approach will require the SystemProducer implementation to have a reference to SystemConsumer for writing data into same buffer or one single implementation to act as both consumer & producer. This isn’t a limitation but things we sign up for if we go down with approaches using BEM. The benefits that come up  with BEM isn’t justified for our use case and hence approach C.
    
    iii) In the code examples,
    
    A) what's the difference between durable state vs non-durable state in
    highlevel API? I don't see any difference. Also, the SEP has clearly
    described that the design is only for InMemory input/output/intermediate
    streams. I noticed that you added changelog as inputs in low-level API. But
    it is not clear how this changelog is defined and why it is an input to the
    application??? 
    The changelog is supposed to be wired through the StoreDescriptor. Since this is not supported in V1, I will go ahead and remove the use case.
    I will add a section on use cases not supported and add these to them for book keeping purpose so that we can revisit these for V2.
    
    B) the code example for checkpoint is empty and we have stated that we
    won't support checkpoint in this SEP. Can we remove it? Removed it.
    
    
    Thanks!
    
    
    -Yi
    
    On Wed, Sep 6, 2017 at 2:06 PM, xinyu liu <xi...@gmail.com> wrote:
    
    > +1 on the overall design. This will make testing a lot easier!
    >
    > Thanks,
    > Xinyu
    >
    > On Wed, Sep 6, 2017 at 10:45 AM, Bharath Kumara Subramanian <
    > codin.martial@gmail.com> wrote:
    >
    > > Hi all,
    > >
    > > Can you please vote for SEP-8?
    > > You can find the design document here
    > > <https://cwiki.apache.org/confluence/pages/viewpage.
    > action?pageId=71013043
    > > >.
    > >
    > > Thanks,
    > > Bharath
    > >
    >
    


Re: [VOTE] SEP-8: Add in-memory system consumer & producer

Posted by Yi Pan <ni...@gmail.com>.
Hi, Bharath,

Overall looks good! I have the following comments:

i) Question on the Type of IME + data partition:

How do we enforce that user adds IME w/ the expected partition id to the
corresponding sub-collection?



ii) In the architecture graph, what's the difference between SSP queues and
Data source/sink? What is the layer exposed to the user (I.e. programmer)?



ii) Agree w/ the approach to use a customized queues managed by the admin.
However, the reason not to use BEM is not very clear. For the matter of
fact, BEM is just one optional base class for SystemConsumer implementation.
Not sure why we necessarily need to be limited by BEM.



iii) In the code examples,

A) what's the difference between durable state vs non-durable state in
highlevel API? I don't see any difference. Also, the SEP has clearly
described that the design is only for InMemory input/output/intermediate
streams. I noticed that you added changelog as inputs in low-level API. But
it is not clear how this changelog is defined and why it is an input to the
application???

B) the code example for checkpoint is empty and we have stated that we
won't support checkpoint in this SEP. Can we remove it?


Thanks!


-Yi

On Wed, Sep 6, 2017 at 2:06 PM, xinyu liu <xi...@gmail.com> wrote:

> +1 on the overall design. This will make testing a lot easier!
>
> Thanks,
> Xinyu
>
> On Wed, Sep 6, 2017 at 10:45 AM, Bharath Kumara Subramanian <
> codin.martial@gmail.com> wrote:
>
> > Hi all,
> >
> > Can you please vote for SEP-8?
> > You can find the design document here
> > <https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=71013043
> > >.
> >
> > Thanks,
> > Bharath
> >
>

Re: [VOTE] SEP-8: Add in-memory system consumer & producer

Posted by Navina Ramesh <nr...@linkedin.com>.
Hi Bharath,


Really good design!


  1.  Based on your SEP, you have listed 3 implementation approaches. Do you know which one we are choosing? I suspect it is Approach C. Can you please confirm and update the SEP?
  2.  Perhaps rename "Test Plan" to "Proposed Usage" or "Usage Example"

Overall, +1 on this. We need this asap!! 😊

Thanks!

Navina

________________________________
From: xinyu liu <xi...@gmail.com>
Sent: Wednesday, September 6, 2017 2:06:45 PM
To: dev@samza.apache.org
Subject: Re: [VOTE] SEP-8: Add in-memory system consumer & producer

+1 on the overall design. This will make testing a lot easier!

Thanks,
Xinyu

On Wed, Sep 6, 2017 at 10:45 AM, Bharath Kumara Subramanian <
codin.martial@gmail.com> wrote:

> Hi all,
>
> Can you please vote for SEP-8?
> You can find the design document here
> <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=71013043
> >.
>
> Thanks,
> Bharath
>

Re: [VOTE] SEP-8: Add in-memory system consumer & producer

Posted by xinyu liu <xi...@gmail.com>.
+1 on the overall design. This will make testing a lot easier!

Thanks,
Xinyu

On Wed, Sep 6, 2017 at 10:45 AM, Bharath Kumara Subramanian <
codin.martial@gmail.com> wrote:

> Hi all,
>
> Can you please vote for SEP-8?
> You can find the design document here
> <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=71013043
> >.
>
> Thanks,
> Bharath
>