You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by Vinod Kumar Vavilapalli <vi...@hortonworks.com> on 2012/09/10 02:51:20 UTC

Splitting the core crunch module

Hi folks,

Getting up to speed  after a long break, was off the grid.

Looking at the code, it looks to me that the api is interspersed with the implementation details a bit. So, I opened https://issues.apache.org/jira/browse/CRUNCH-60 and put in a proposal, please let me know what you think.

This could be a little bit of intrusive change now, but I believe it would help us a lot in the long run.

Thanks,
+Vinod

Re: Splitting the core crunch module

Posted by Matthias Friedrich <ma...@mafr.de>.
Hi Vinod,

first of all, welcome (back), I believe we haven't met :)

Splitting Crunch is on my agenda, too, but I haven't been able to come
up with a game plan yet (and I needed a break after all the dependency
cleanup work and the HBase split). I think it's a great idea, we should
definitely do it.

Unfortunately, it's a bit complicated because right now there are lots
of cyclic package dependencies (see [1], the picture there shows Crunch's
dependency graph). Splitting stuff into modules is going to require quite
a bit of refactoring because we have to cut dependencies.

I think we should first draw a high-level package diagram (just the top
packages) that shows which package depends on which. As per Robert C.
Martin's SOLID principles, interface packages should not depend on
implementation packages. Then we can assign the existing classes to
packages and refactor if necessary.

As an example, the "io" package looks to me like it should be an
implementation package; I'd move the interfaces (PathTarget, OutputHandler
etc.) to the client API package ("org.apache.crunch" currently) to separate
them from implementations like From, To, and At.

Regards,
  Matthias

[1] http://blog.mafr.de/2012/08/26/visualizing-package-dependencies/

On Sunday, 2012-09-09, Vinod Kumar Vavilapalli wrote:
> Hi folks,
> 
> Getting up to speed  after a long break, was off the grid.
> 
> Looking at the code, it looks to me that the api is interspersed with the implementation details a bit. So, I opened https://issues.apache.org/jira/browse/CRUNCH-60 and put in a proposal, please let me know what you think.
> 
> This could be a little bit of intrusive change now, but I believe it would help us a lot in the long run.
> 
> Thanks,
> +Vinod