You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Roger Hoover (JIRA)" <ji...@apache.org> on 2014/09/10 00:45:29 UTC

[jira] [Commented] (SAMZA-40) Refactor Samza configuration

    [ https://issues.apache.org/jira/browse/SAMZA-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127704#comment-14127704 ] 

Roger Hoover commented on SAMZA-40:
-----------------------------------

Here are some things that come to mind for me but I haven't really though through:

- What about a way to specify a DAG for the job? From the developer's point of view, she mostly cares of the data flow.  Maybe there could a pluggable naming schema for topics in between jobs so that you don't have to explicitly name them???  You'd want a nice way to specify this.  YAML??  Using job-name:

wikipedia-feed
  - wikipedia-parser
      - wikipedia-stats

Ideally, that would be enough to wire everything together???

- Support a programatic, code-level API for building, validating and deploying jobs?  Hopefully, this would make it possible to build higher-level frameworks on top that could dynamically generate jobs.  I don't know if I'd ever want to do this but if the API is there, you never know what will spring up.
- Support for validation during build and during runtime initialization to catch errors early.
- Can sensible defaults make the config less verbose?
  - What about on/off switches for things like metrics and checkpointing?  If don't specify otherwise, you get the default metrics package and Kafka checkpointing.

> Refactor Samza configuration
> ----------------------------
>
>                 Key: SAMZA-40
>                 URL: https://issues.apache.org/jira/browse/SAMZA-40
>             Project: Samza
>          Issue Type: Bug
>          Components: container
>    Affects Versions: 0.6.0
>            Reporter: Chris Riccomini
>              Labels: project
>
> Samza's configuration system has several problems that we need to resolved.
> * Want to auto-generate documentation based off of configuration.
> * Should support global defaults for a config property. Right now, we do config.getFoo.getOrElse() everywhere.
> * Should validate config up front, rather than thrown runtime exceptions randomly throughout the code.
> * We are mixing wiring and configuration together. How do other systems handle this?
> * We have fragmented configuration (anybody can define configuration). How do other systems handle this?
> * How to handle undefined configuration? How to make this interoperable with both Java and Scala (i.e. should we support Option in Scala)?
> * Should remain immutable.
> * Should remove implicits. It's just confusing.
> * Do we want to support complex types (list, map) for values, not just String?
> We need a design proposal for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)