You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Chris Riccomini (JIRA)" <ji...@apache.org> on 2014/07/18 19:36:05 UTC

[jira] [Commented] (SAMZA-348) Configure Samza jobs through a stream

    [ https://issues.apache.org/jira/browse/SAMZA-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066578#comment-14066578 ] 

Chris Riccomini commented on SAMZA-348:
---------------------------------------

We are discussing storing checkpoints in the ConfigLog. The way we handle checkpoints might change when we want to leverage Kafka's proposed transactionality feature:

https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka

I believe that this feature requires us to store checkpoints in Kafka, since the offset commit and transaction commit must happen atomically. I need to re-read the design doc to refresh my memory, but if this holds true, then the ConfigLog wouldn't be useful for storing Kafka offsets. It might still be useful for other systems (e.g. file system), though.

> Configure Samza jobs through a stream
> -------------------------------------
>
>                 Key: SAMZA-348
>                 URL: https://issues.apache.org/jira/browse/SAMZA-348
>             Project: Samza
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Chris Riccomini
>
> Samza's existing config setup is problematic for a number of reasons:
> # It's completely immutable once a job starts. This prevents any dynamic reconfiguration and auto-scaling. It is debatable whether we want these feature or not, but our existing implementation actively prevents it. See SAMZA-334 for discussion.
> # We pass existing configuration through environment variables. YARN exports environment variables in a shell script, which limits the size to the varargs length on the machine. This is usually ~128KB. See SAMZA-333 and SAMZA-337 for details.
> # User-defined configuration (the Config object) and programmatic configuration (checkpoints and TaskName:State mappings (see SAMZA-123)) are handled differently. It's debatable whether this makes sense.
> In SAMZA-123, [~jghoman] and I propose implementing a ConfigLog. This log would replace both the checkpoint topic and the existing config environment variables in SamzaContainer and Samza's YARN AM.
> I'd like to keep this ticket's scope limited to just the implementation of the ConfigLog, and not re-designing how Samza's config is used in the code (SAMZA-40). We should, however, discuss how this feature would affect dynamic reconfiguration/auto-scaling.



--
This message was sent by Atlassian JIRA
(v6.2#6252)