You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Cameron Lee (Confluence)" <no...@apache.org> on 2019/11/19 03:40:00 UTC

[CONF] Apache Samza > SEP-24: Cluster-based Job Coordinator Dependency Isolation

There's **1 new edit** on this page  
---  
|  
---  
|  | [![page icon](cid:page-
icon)](https://cwiki.apache.org/confluence/display/SAMZA/SEP-24%3A+Cluster-
based+Job+Coordinator+Dependency+Isolation?src=mail&src.mail.product=confluence-
server&src.mail.timestamp=1574134800488&src.mail.notification=com.atlassian.confluence.plugins.confluence-
notifications-batch-plugin%3Abatching-
notification&src.mail.recipient=8aa980875bf24635015c82a7d09902ac&src.mail.action=view
"page icon")  
---  
[SEP-24: Cluster-based Job Coordinator Dependency
Isolation](https://cwiki.apache.org/confluence/display/SAMZA/SEP-24%3A+Cluster-
based+Job+Coordinator+Dependency+Isolation?src=mail&src.mail.product=confluence-
server&src.mail.timestamp=1574134800488&src.mail.notification=com.atlassian.confluence.plugins.confluence-
notifications-batch-plugin%3Abatching-
notification&src.mail.recipient=8aa980875bf24635015c82a7d09902ac&src.mail.action=view
"SEP-24: Cluster-based Job Coordinator Dependency Isolation")  
|  |  |  |  | ![](cid:avatar_1303269082592dacd3bfdafc2ec8569d) |  | Cameron
Lee edited this page  
---  
|  
|  | Here's what changed:  
---  
|

#  Overview

A deployable Samza application currently consists of JARs for Samza
infrastructure code (and dependent JARs) and JARs for application-specific
code (and dependent JARs). The full deployable package is determined at build
time. When deploying an application, the built package of JARs is placed on
the necessary node(s), which includes the job coordinator and the processing
containers. This build-time packaging has benefits, as it simplifies the
deployment responsibilities of Samza infrastructure – the package built by the
application has everything needed to run a Samza application. Application
owners (who may not be the same as the owners of the Samza infrastructure)
choose the version of Samza to use and do the packaging.

One pain point in working under this model involves dependency management.
Since applications do the packaging of JARs, it is up to them to do dependency
conflict resolution. If application-specific code builds against a dependency
of a certain version and Samza infrastructure code builds against that same
dependency with a different version, then only one of those versions will
actually get used at runtime. This can result in unexpected versions of
libraries being used at runtime, causing issues like ClassNotFoundExceptions.
There are some parts of Samza infrastructure which are relatively agnostic of
application-specific code (e.g. YARN application master), but those can still
be impacted by how an application does the packaging of JARs (e.g. what
dependencies are included). Samza infrastructure is validated against a
certain set of dependencies, but applications can still change the actual
runtime dependencies that are used. These issues result in lower availability
and the need to spend time on debugging. It is also up to the application to
fix the packaging.

It would be helpful to be able to isolate the dependencies of the Samza
infrastructure from the dependencies of the application. This SEP covers how
to achieve this for the cluster-based job coordinator, which is used when
running Samza jobs in resource management systems like YARN.

##  Terms

| Term | Description  
---|---  
cluster-based job coordinator | process that is responsible for managing the
processing containers of a Samza job (e.g. starting containers, keeping
correct # of containers running) when running Samza with a resource management
system  
YARN | a resource management system which can be used to run Samza jobs  
application master | a cluster-based job coordinator in the context of YARN  
application runner | Samza component which is responsible for launching an
application  
application (or application-specific) | code and dependencies which are
specific to a particular Samza application, as opposed to Samza infrastructure  
pluggable (or plugin) class | class which is specified by an application
through configuration (e.g. system factory, grouper)  
  
#  Requirements

  * Application dependencies should not be able to impact the Samza cluster-based job coordinator
  * Solution should be leverageable for the Samza logic running on processing containers

  
|  |  | [Go to page
history](https://cwiki.apache.org/confluence/pages/viewpreviousversions.action?pageId=135861548&src=mail&src.mail.product=confluence-
server&src.mail.timestamp=1574134800488&src.mail.notification=com.atlassian.confluence.plugins.confluence-
notifications-batch-plugin%3Abatching-
notification&src.mail.recipient=8aa980875bf24635015c82a7d09902ac "Go to page
history")  
---  
---  
| [View
page](https://cwiki.apache.org/confluence/display/SAMZA/SEP-24%3A+Cluster-
based+Job+Coordinator+Dependency+Isolation?src=mail&src.mail.product=confluence-
server&src.mail.timestamp=1574134800488&src.mail.notification=com.atlassian.confluence.plugins.confluence-
notifications-batch-plugin%3Abatching-
notification&src.mail.recipient=8aa980875bf24635015c82a7d09902ac&src.mail.action=view)  
---  
  
|  | [Stop watching
space](https://cwiki.apache.org/confluence/users/removespacenotification.action?spaceKey=SAMZA&src=mail&src.mail.product=confluence-
server&src.mail.timestamp=1574134800488&src.mail.notification=com.atlassian.confluence.plugins.confluence-
notifications-batch-plugin%3Abatching-
notification&src.mail.recipient=8aa980875bf24635015c82a7d09902ac&src.mail.action=stop-
watching&jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJ4c3JmOjhhYTk4MDg3NWJmMjQ2MzUwMTVjODJhN2QwOTkwMmFjIiwicXNoIjoiZDQ4M2M1YjFjOTA2ZTYyNzk5ZGJiNDM1ODgxMWEwMjNmYzZmY2NjNmNhY2U3NTczNjg5MzZhYmZlMjlhOWIwNyIsImlzcyI6ImNvbmZsdWVuY2Vfbm90aWZpY2F0aW9uc0FSRUgtWFVEMS1QT1FHLUNTQU8iLCJleHAiOjE1NzQ3Mzk2MDAsImlhdCI6MTU3NDEzNDgwMH0.UocmgAqlT8yfQeX53RLcjwhEdwjd9we
--_NNX17SBwg) | •  
---|---  
[Manage
notifications](https://cwiki.apache.org/confluence/users/editmyemailsettings.action?src=mail&src.mail.product=confluence-
server&src.mail.timestamp=1574134800488&src.mail.notification=com.atlassian.confluence.plugins.confluence-
notifications-batch-plugin%3Abatching-
notification&src.mail.recipient=8aa980875bf24635015c82a7d09902ac&src.mail.action=manage)  
---  
| ![Confluence logo big](cid:footer-desktop-logo)  
---  
This message was sent by Atlassian Confluence 6.15.8  
![](cid:footer-mobile-logo)  
---