You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Cameron Lee <ca...@gmail.com> on 2020/03/22 18:26:58 UTC

[RESULT] [VOTE] SEP-24: Cluster-based Job Coordinator Dependency Isolation

Hi all,

I have updated the SEP to address the feedback received.

The vote on the SEP has been open for over 72 hours and has received 3
binding +1s (Prateek, Jagadish, Yi) and 1 non-binding +1 (Ke).

The vote has passed.

Thank you to everyone for the feedback.

Cameron

On Mon, Mar 16, 2020 at 4:07 PM Yi Pan <ni...@gmail.com> wrote:

> Hey, Cameron,
>
> Thanks for the detailed answers. It would be good to add this explanation
> to the SEP page as well.
>
> Otherwise, +1 from my side. Thanks!
>
> -Yi
>
> On Mon, Mar 16, 2020 at 10:06 AM Cameron Lee <ca...@gmail.com>
> wrote:
>
> > You have the correct understanding about the "yarn.resources.*"
> > configuration, and your question is a good one. Currently, the
> > implementation is that Samza will look in a specific place on the file
> > system (i.e. <current working directory>/__samzaFrameworkApi and <current
> > working directory>/__samzaFrameworkInfrastructure) to get the
> > API/infrastructure classpaths. I have a TODO in the code to make the file
> > system location configurable (or specified through an environment
> > variable). The configuration or environment variable for the file system
> > location would not be YARN-specific, and it would be applicable to any
> > execution environment.
> >
> > On Wed, Mar 11, 2020 at 10:54 PM Yi Pan <ni...@gmail.com> wrote:
> >
> > > OK. If I understand correctly, your answer is the following:
> > > yarn.resources.* configuration variables are used by YARN localizer to
> > make
> > > API and infrastructure classpath available, together with the
> > application's
> > > own classpath, which is also determined by the YARN localizer.
> > > The question here is: how do we let the container JVM know the
> > > API/infrastructure classpaths when launching the container processes?
> If
> > > the API and infrastructure classpaths (i.e. installation path
> determined
> > by
> > > the localizer) are customizable, then we would need to tell the
> container
> > > JVM those API/infra classpaths via some configuration variables as
> well,
> > > right? Hence, those configuration variable names need to be understood
> by
> > > the Samza application's code (which is run within the container) as
> well.
> > > If not, what's the mechanism that we will use to let the container JVM
> > > process to know where the YARN localizer has put API/infra classpaths?
> > >
> > > Thanks!
> > >
> > > -Yi
> > >
> > >
> > >
> > > On Wed, Mar 11, 2020 at 8:09 PM Cameron Lee <ca...@gmail.com>
> > > wrote:
> > >
> > > > The configuration variables are only used by the YARN localizer. The
> > > Samza
> > > > application will look for the framework resources in certain places
> in
> > > the
> > > > application's working directory when it needs to access them. My aim
> is
> > > to
> > > > do something similar to how "yarn.package.path" works. In other
> > execution
> > > > environments, it is my understanding that "yarn.package.path" would
> get
> > > > replaced by a different environment-specific configuration key/value.
> > > > I agree that we should not use "yarn.resources.*" if the
> configurations
> > > are
> > > > not YARN-specific. Do you think that these resource localization
> > configs
> > > > are generalizable to arbitrary environments? If so, does that mean
> > > > "yarn.package.path" is also generalizable? For example, what if some
> > > > execution environment does not use URLs to specify resource locations
> > > > (although maybe this isn't a reasonable concern to worry about?)?
> > > >
> > > > Thanks,
> > > > Cameron
> > > >
> > > > On Wed, Mar 11, 2020 at 4:43 PM Yi Pan <ni...@gmail.com> wrote:
> > > >
> > > > > Hi, Cameron,
> > > > >
> > > > > Thanks for the quick responses! Appreciate it.
> > > > >
> > > > > I am still having a concern on a): are those configuration
> variables
> > > used
> > > > > by YARN localizer or by Samza applications? If those are used only
> by
> > > the
> > > > > YARN localizer, I agree that we should keep those as yarn specific.
> > > > > Otherwise, I think that would still be better to name those as
> > > > > cluster.based.resources.*. The reason being: Samza applications are
> > > > > supposed to be able to run on different execution environments.
> > > Ideally,
> > > > > when we are deploying the same Samza application on YARN vs Mesos
> or
> > > > > managed K8s clusters, we should only need to change the configure
> > > values,
> > > > > not the configuration variable names and values. Does it make
> sense?
> > > > > Otherwise, we can schedule a conf call to clarify that.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > -Yi
> > > > >
> > > > > On Tue, Mar 10, 2020 at 3:25 PM Cameron Lee <
> cameronlee314@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > a) The "yarn.resources.*" configs are for localizing the
> necessary
> > > > > > resources into the working directory for the process. I felt that
> > the
> > > > > > specific configuration format to specify these resources might be
> > > > > > YARN-specific (e.g. YARN has type and visibility configs for each
> > of
> > > > its
> > > > > > resources), so a generic format might not apply. In a non-YARN
> > case,
> > > > the
> > > > > > localization configs would need to be specified according to the
> > > > > technology
> > > > > > being used.
> > > > > > b) It is correct that the Avro version will need to be compatible
> > > with
> > > > > the
> > > > > > version that is used by the infrastructure, if infrastructure
> needs
> > > to
> > > > > use
> > > > > > Avro and pass the Avro object to the application. This is the
> case
> > > with
> > > > > any
> > > > > > serde technology that needs to be used. For the job coordinator,
> it
> > > is
> > > > > not
> > > > > > much of a concern anyways, since it is not doing serde of Avro
> > > > messages.
> > > > > > This may be more of a concern for general split deployment, which
> > > will
> > > > > > impact the processing containers, and will be a separate SEP.
> > > > > > c) It should work to leave infrastructure serdes in the
> > > infrastructure
> > > > > > classpath. The infrastructure serdes just see generic types
> (which
> > > are
> > > > > > java.lang.Object at runtime) for the messages, and they don't do
> > > > anything
> > > > > > with the concrete types, so in the infrastructure classes, the
> > > messages
> > > > > get
> > > > > > passed around as Object, but their concrete classes can still be
> > > loaded
> > > > > > from the application. As with (b), this is more of a concern for
> > > > general
> > > > > > split deployment, since the job coordinator doesn't do message
> > > serde. I
> > > > > > have run some tests regarding this classloading pattern, but we
> > will
> > > do
> > > > > > further verification for general split deployment.
> > > > > > d) Yes, you are correct. Good catch. It should be "described
> above
> > at
> > > > > > Application classloader".
> > > > > >
> > > > > > Thanks for all of your questions. I will clarify some details in
> > the
> > > > doc
> > > > > > regarding your questions.
> > > > > >
> > > > > > Cameron
> > > > > >
> > > > > > On Mon, Mar 9, 2020 at 12:07 PM Yi Pan <ni...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > Hi, Cameron,
> > > > > > >
> > > > > > > Sorry to chime in late. Overall, looks great! I do have a few
> > > > > > > suggestions/questions before I can cast my vote here:
> > > > > > > a) for the configuration variable names, why are we limiting
> > > > ourselves
> > > > > to
> > > > > > > yarn.resource.*? We have changed some of the configuration
> > > variables
> > > > > from
> > > > > > > yarn specific to non-yarn specific. I would love to keep that
> > > > > consistent
> > > > > > > (i.e. gradually moving all our yarn-specific configuration
> > > variables
> > > > to
> > > > > > > non-yarn-specifc names)
> > > > > > > b) for the avro case as referred to in the delegation case in
> the
> > > > > > > Infrastructure classloader, if we delegate the object
> > > deserialization
> > > > > > class
> > > > > > > to the application classloader, would it be possible that the
> > > > > application
> > > > > > > provides an non-compatible version of avro class than the ones
> > used
> > > > > > within
> > > > > > > the "infrastructure plugins" and hence causing runtime
> exception
> > in
> > > > the
> > > > > > > infrastructure plugin? Or is the solution being: do not
> directly
> > > use
> > > > > > serde
> > > > > > > classes in the infrastructure code?
> > > > > > > c) following the description of infrastructure classloader
> flow,
> > > > where
> > > > > > > should we expect the serde classes? In the application
> > classpath, I
> > > > > > guess?
> > > > > > > So, does that mean that we should exclude serde classes
> > (including
> > > > > > > SerializableSerde and JsonSerdeV2) in the Samza infrastructure
> > > > package,
> > > > > > and
> > > > > > > tell the users to package them in application package?
> > > > > > > d) I am a bit confused about the description on "multiple"
> > > > application
> > > > > > > classloaders on the job coordinator: one is for the describe
> flow
> > > and
> > > > > the
> > > > > > > other is in the "Application" classloader, instead of "API"
> > > > > classloader,
> > > > > > > right?
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > -Yi
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Mar 4, 2020 at 11:32 AM Ke Wu <ke...@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > > +1.
> > > > > > > >
> > > > > > > > Thanks for driving this effort.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Ke
> > > > > > > >
> > > > > > > > > On Mar 3, 2020, at 6:28 PM, Jagadish Venkatraman <
> > > > > > > jagadish1989@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > +1 binding.
> > > > > > > > >
> > > > > > > > > Thanks Cameron. I look forward to this feature taking our
> > > "Stream
> > > > > > > > > Processing as a service" offering to the next level.
> > > > > > > > >
> > > > > > > > > Cheers
> > > > > > > > >
> > > > > > > > > On Tuesday, March 3, 2020, Prateek Maheshwari <
> > > > prateekm@utexas.edu
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> +1 (binding) from me. Thanks for contributing this
> feature.
> > > > > Looking
> > > > > > > > forward
> > > > > > > > >> to having dependency isolation and to the ability to
> upgrade
> > > the
> > > > > > > > framework
> > > > > > > > >> independently from an application.
> > > > > > > > >>
> > > > > > > > >> Thanks,
> > > > > > > > >> Prateek
> > > > > > > > >>
> > > > > > > > >> On Fri, Feb 28, 2020 at 10:48 AM Cameron Lee <
> > > > > > cameronlee314@gmail.com
> > > > > > > >
> > > > > > > > >> wrote:
> > > > > > > > >>
> > > > > > > > >>> Hi all,
> > > > > > > > >>>
> > > > > > > > >>> This is a call for a vote on SEP-24: Cluster-based Job
> > > > > Coordinator
> > > > > > > > >>> Dependency Isolation. Thanks to everyone who reviewed the
> > > > > proposal
> > > > > > > and
> > > > > > > > >>> provided feedback.
> > > > > > > > >>>
> > > > > > > > >>> I have addressed comments on the SEP, and I am not aware
> of
> > > any
> > > > > > > further
> > > > > > > > >>> major questions or objections, so I am starting this
> vote.
> > > > > > > > >>>
> > > > > > > > >>> SEP link:
> > > > > > > > >>>
> > > > > > > > >>> https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > > > > > > > >> 24%3A+Cluster-based+Job+Coordinator+Dependency+Isolation
> > > > > > > > >>>
> > > > > > > > >>> Discuss thread:
> > > > > > > > >>>
> > > > > > > > >>>
> > > > > https://mail-archives.apache.org/mod_mbox/samza-dev/202001.mbox/%
> > > > > > > > >> 3cCAMja7KeGcRZ3H95Rxk5XE=
> > > > > > 60zxm6jxJKjUWwxmGmaDPFbYKWQw@mail.gmail.com
> > > > > > > %3e
> > > > > > > > >>> There was also some discussion through comments on the
> SEP
> > > page
> > > > > > (see
> > > > > > > > >>> Resolved Comments).
> > > > > > > > >>>
> > > > > > > > >>> Please vote:
> > > > > > > > >>> [ ] +1 approve
> > > > > > > > >>> [ ] +0 no opinion
> > > > > > > > >>> [ ] -1 disapprove (and reason why)
> > > > > > > > >>>
> > > > > > > > >>> Thank you,
> > > > > > > > >>> Cameron
> > > > > > > > >>>
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Jagadish
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>