You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2011/02/14 14:33:57 UTC

[jira] Commented: (MAHOUT-593) Backport of Stochastic SVD patch (Mahout-376) to hadoop 0.20 to ensure compatibility with current Mahout dependencies.

    [ https://issues.apache.org/jira/browse/MAHOUT-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994289#comment-12994289 ] 

Sean Owen commented on MAHOUT-593:
----------------------------------

This is looking good and so I want to get this on into the codebase. I have some broad suggestions from a first glance.

* You may need to regenerate this patch as I think a few recent changes will conflict
* I think it's a decent convention to not have Mappers and Reducers as inner classes. They can be top-level classes in the same package. It may be personal preference but I think much of our MapReduce code does this. In general I think lots of inner classes that don't need to be inner just makes things hard to find.
* It's also preferable, I think to try to standardize how the Job is implemented. I believe we're trying to use this AbstractJob class as the template for Job implementations. Can this be put into that mold? I see at least one implementation does this.
* Spacing looks uneven -- should be 2 spaces per indent
* Might look at the coding conventions in surrounding code -- there are a fair number of differences. For example we don't use the "m_" convention for members. You might run "mvn checkstyle:checkstyle" for a complete list of what's at odds with the project style.
* Apache code doesn't usually have @author tags
* Don't throw RuntimeException -- prefer subclasses
* Use RandomUtils.getRandom(), not new Random(), for test repeatability
* We already have an IOUtils class that does (or can be augmented to do) what your IOUtil class does. Best to combine them.
* Can you comment on how this replaces, extends, complements the 2+ existing SVD implementation we have already? (In the javadoc ideally)
* extend MahoutTestCase, not TestCase

> Backport of Stochastic SVD patch (Mahout-376) to hadoop 0.20 to ensure compatibility with current Mahout dependencies.
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-593
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-593
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.4
>            Reporter: Dmitriy Lyubimov
>             Fix For: 0.5
>
>         Attachments: MAHOUT-593.patch.gz
>
>
> Current Mahout-376 patch requries 'new' hadoop API.  Certain elements of that API (namely, multiple outputs) are not available in standard hadoop 0.20.2 release. As such, that may work only with either CDH or 0.21 distributions. 
>  In order to bring it into sync with current Mahout dependencies, a backport of the patch to 'old' API is needed. 
> Also, some work is needed to resolve math dependencies. Existing patch relies on apache commons-math 2.1 for eigen decomposition of small matrices. This dependency is not currently set up in the mahout core. So, certain snippets of code are either required to go to mahout-math or use Colt eigen decompositon (last time i tried, my results were mixed with that one. It seems to produce results inconsistent with those from mahout-math eigensolver, at the very least, it doesn't produce singular values in sorted order).
> So this patch is mainly moing some Mahout-376 code around.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira