You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2011/05/31 19:14:47 UTC
[jira] [Updated] (MAHOUT-663) Rationalize hadoop job creation with
respect to setJarByClass
[ https://issues.apache.org/jira/browse/MAHOUT-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated MAHOUT-663:
-----------------------------
Affects Version/s: 0.5
Fix Version/s: 0.6
I agree that this can and should be changed.
I think the current behavior is sort of necessary: you have to have told Hadoop where the classes are, and it makes a very decent guess. But it's at best a good default and needs to be changeable.
Good news is this is already how AbstractJob works, sort of. You get a Job object that you further configure and run. But I think you're suggesting that it's really something that should be specifiable on the command line somehow, as a common arg, that overrides the default? Sounds easy and good to me.
The broader point you're making, I think, is that there should be a more consistent structure to "Drivers" and that structure ought to be X, Y or Z. Couldn't agree more and that's a long-standing problem which I've been trying to push via AbstractJob. At least once it's all in one place, it's easy to make that one approach do X Y or Z as we like. So I kind of suggest that is the way forward beyond what I suggest above.
> Rationalize hadoop job creation with respect to setJarByClass
> -------------------------------------------------------------
>
> Key: MAHOUT-663
> URL: https://issues.apache.org/jira/browse/MAHOUT-663
> Project: Mahout
> Issue Type: Bug
> Components: build
> Affects Versions: 0.4, 0.5
> Reporter: Benson Margulies
> Fix For: 0.6
>
>
> Mahout includes a series of driver classes that create hadoop jobs via static methods.
> Each one of these calls job.setJarByClass(itself.class).
> Unfortunately, this subverts the hadoop support for putting additional jars in the lib directory of a job jar, since the class passed in is not a class that lives in the ordinary section of the job jar.
> The effect of this is to force users of Mahout (and Mahout's own example job jar) to unpack the mahout-core jar into the main section, instead of just treating it as a 'lib' dependency.
> It seems to me that all the static job creators should be refactored into a public function that returns a job object (and does NOT call waitForCompletion), and then the existing wrapper. Users could call the new functions, and make their own call to setJarByClass.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira