You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by John Conwell <jo...@iamjohn.me> on 2013/11/25 19:06:48 UTC

Map/Reduce/Driver jar(s) organization

I'm curious what are some best practices for structuring jars for a
business framework that uses Map/Reduce?  Note: This is assuming you aren't
invoking MR manually via the cmd line, but have Hadoop integrated into a
larger business framework that invokes MR jobs programmatically.

By business framework I mean an architecture that includes a services
component (REST, app server, whatever), business domain logic, and Hadoop
MR jobs, etc.

Here are some common code artifacts in such an architecture:
* Map/Reduce classes
* Hadoop Driver classes that configure the MR job and invoke them
* Biz Domain classes that invoke the Hadoop driver classes, within the
context of some business process
* Services classes that interface between user-calls/system-events and biz
domain logic

Are most people creating monolithic jars that have all classes for all
layers? Separating all hadoop related classes from domain level classes?
 Are you putting the MR classes in the same jar as the Hadoop driver
classes, or separate jars?

Thanks,
Turbo

-- 

Thanks,
John C