You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2010/09/26 11:57:32 UTC

[jira] Resolved: (MAHOUT-348) Trainer jobs should implement Hadoop's Tool

     [ https://issues.apache.org/jira/browse/MAHOUT-348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved MAHOUT-348.
------------------------------

      Assignee: Sean Owen
    Resolution: Duplicate

I agree with this though would consider it something subsumed by MAHOUT-167, MAHOUT-294 as they are about using "AbstractJob" which implements Tool.

> Trainer jobs should implement Hadoop's Tool
> -------------------------------------------
>
>                 Key: MAHOUT-348
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-348
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification
>    Affects Versions: 0.3
>            Reporter: Ferdy
>            Assignee: Sean Owen
>
> It would be nice if the Trainer jobs (and Mahout jobs in general, those not already doing so) would implement Tool. From the Hadoop's javadocs:
> "Tool, is the standard for any Map-Reduce tool/application. The tool/application should delegate the handling of standard command-line options to ToolRunner.run(Tool, String[]) and only handle its custom arguments."
> The problem we are running into currently is the fact that as of Mahout 0.3 there is no way to submit a CBayesDriver job with custom Configuration. Therefore it is not possible to set the classpath right for it's Mappers and Reducers, if one is to run the CBayesDriver with the generic "-libjars" option. Of course, this particular problem could be solved by just putting the required jars in the Hadoop lib dir, however this not always possible. For a custom Hadoop deployment (shared among many users and different types of jobs), every job should be able to specify it's own library dependencies.
> Note: I'm currently aware of issue MAHOUT-167, which has limited overlap with this issue: MAHOUT-167 states that the new API should be used (particulary for Clustering jobs). This issue addresses the needs for implementing a Hadoop Job interface at all, preferably Tool.
> Also, there's issue MAHOUT-294, an effort to track all changes surrounding the Job API.
> Let me hear your thoughts, and I'll whip up a patch when needed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.