You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Chris Riccomini (JIRA)" <ji...@apache.org> on 2014/10/21 23:06:34 UTC
[jira] [Updated] (SAMZA-437) Remove TaskLifecycleListener
[ https://issues.apache.org/jira/browse/SAMZA-437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Riccomini updated SAMZA-437:
----------------------------------
Attachment: SAMZA-437-0.patch
Attaching draft patch. Review board is at:
https://reviews.apache.org/r/26992/
Changes made:
# Removed environment variable compression (SAMZA-337), since we're passing via HTTP now.
# Converted YARN's WebAppServer to a generic HttpServer in samza-core.
# Wrote JobServlet, which serves a job's config, task:SSP mapping, and task:changelog partition mapping via HTTP JSON.
# Updated ShellCommandBuilder/CommandBuilder to set an HTTP URL environment variable, rather than config, task:SSP mapping, and task:changelog partition mapping.
# Updated ProcessJob, ThreadJob, and the AM code to run the HttpServer/JobServlet, and set the HTTP URL environment variable when starting SamzaContainers.
# Update SamzaContainer to fetch config, task:SSP mapping, and task:changelog partition mapping using HTTP URL environment variable.
# Changed container name to container ID in SamzaContainerContext, and run-\*.sh. Kept legacy "samza.container.name" system property, so we're backwards compatible with log4j.properties files that refer to it (hello-samza).
# Wrote a JsonHelpers class to help with all the terrible back bending we have to do to make Scala work with Jackson.
# Updated Util to remove the compress methods, and add a helper "read" method for reading from HTTP URLs.
Remaining items in this ticket:
# Write tests.
Tickets I'd like to open as follow ons:
# Remove TaskNamesToSystemStreamPartitions, and convert to a proper job/container/task data model. Make the data model Java based, and use Jackson annotations.
# Convert the HttpServer to be a proper Jetty/Jersey/Jackson server that uses the properly defined data model to serve its content.
> Remove TaskLifecycleListener
> ----------------------------
>
> Key: SAMZA-437
> URL: https://issues.apache.org/jira/browse/SAMZA-437
> Project: Samza
> Issue Type: Bug
> Components: container
> Reporter: Chris Riccomini
>
> We recently had a use case where we needed to wrap the Samza process() method in some code. The TaskLifecycleListener was insufficient to do this. We get a beforeProcess and afterProcess, but what we really wanted was:
> {code}
> def wrapProcess(...) {
> foo.doSomething(new Wrapper() {
> task.process(...)
> })
> }
> {code}
> We ended up just writing a wrapper task, and having the normal code defined via a subtask config:
> {noformat}
> task.class=foo.bar.WrapperTask
> task.subtask.class=foo.bar.NormalTask
> {noformat}
> Both of these tasks implement StreamTask. Samza just sees WrapperTask, and treats it like a normal task. Wrapper task instantiates the subtask, and manages its lifecycle internally.
> This approach seems superior to the TaskLifecycleListener.
> * Allows tasks to be composed multiple times.
> * Removes this complexity from the Samza framework, and makes it a concern of the job owner.
> * Allows the wrapper task to do things like filtering messages, tweaking configs and serialization, catching exceptions, etc.
> Given this, it seems that TaskLifecycleListener is a degenerate case, and adds complexity to the framework. I propose removing it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)