You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Jakob Homan (JIRA)" <ji...@apache.org> on 2013/11/26 22:55:35 UTC

[jira] [Commented] (SAMZA-11) Refactor Samza subprojects

    [ https://issues.apache.org/jira/browse/SAMZA-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13833123#comment-13833123 ] 

Jakob Homan commented on SAMZA-11:
----------------------------------

Hadoop has a similar process by which the classes annotated for public use are generated during one javadoc run and the rest another.  That code is ripe for the stealing.

> Refactor Samza subprojects
> --------------------------
>
>                 Key: SAMZA-11
>                 URL: https://issues.apache.org/jira/browse/SAMZA-11
>             Project: Samza
>          Issue Type: Bug
>          Components: container
>    Affects Versions: 0.6.0
>            Reporter: Chris Riccomini
>
> In a recent merge, I refactored some packaging in samza-api, so that not everything was just in the samza root package space. I did the package re-org based on logical grouping (i.e. classes that were similar in some logical way would end up in the same package space).
> There's been some discussion about the package structure. Right now, samza-api package has a mixture of framework-level interfaces, and public (user-facing) interfaces. For example, SystemConsumer is a framework interface, but StreamTask is a public user-facing interface.
> Question is: does it make sense to split these two interface groups up in some way? If so, how?
> I think it does make sense to split the interfaces up, because it's basically free to do, and will make it very obvious (based on where you put an interface) what the contract is with the parties involved (framework implementor, and stream task implementor). It will also allow us to generate two different sets of Javadocs: one for framework developers, and one for regular StreamTask developers. Lastly, it'll let the average developer just go to a specific subfolder, and learn about Samza, without having to pay attention to things like serializers, system consumers, etc.
> I can think of two ways to do this:
> 1. Annotations (@Public, @Framework).
> 2. Within samza-api, add two package spaces: samza.api, and samza.framework.
> 3. Create two separate artifacts/projects: samza-api, and samza-framework-api.
> I prefer the third approach. First, it means we can keep the package spaces the same (samza.task vs samza.api.task, samza.system vs samza.framework.system, etc). Second, it means we can generate two completely separate Javadoc directories. This is kind of cool because it should make things more straightforward for the average user to pick up. Third, it actually means that you could implement an entirely separate framework to run Samza tasks without using our framework API. I'm not saying this is a good idea, but it's generally a good sign when things line up this way.
> My proposed change would be:
> Refactor Samza API to:
>  * samza-task-api
>  * samza-framework-api (depends on samza-api)
> Refactor samza-core to:
>  * samza-util (depends on samza-task-api)
>  * samza-container (depends on samza-task-api, samza-framework-api, and samza-util)
> The reason for the container/util split is that there are some other subprojects (samza-kafka, and samza-yarn) that pull in all of core just to get access to some util classes. This split would allow other subprojects (and other projects, in general) to pull in util stuff without pulling in all of container, which they don't need.



--
This message was sent by Atlassian JIRA
(v6.1#6144)