You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bigtop.apache.org by "jay vyas (JIRA)" <ji...@apache.org> on 2013/11/14 20:25:20 UTC

[jira] [Commented] (BIGTOP-1128) FIX and modularize mahout sample data sets

    [ https://issues.apache.org/jira/browse/BIGTOP-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822795#comment-13822795 ] 

jay vyas commented on BIGTOP-1128:
----------------------------------

this patch is ready for review ! it will

1) run faster for movie lens data
2) fail faster (if/when urls become obsolete)
3) be more maintainable 
4) possibly help people to understand bigtop supported mahout operations, because of the commenting

> FIX and modularize mahout sample data sets
> ------------------------------------------
>
>                 Key: BIGTOP-1128
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-1128
>             Project: Bigtop
>          Issue Type: Bug
>            Reporter: jay vyas
>         Attachments: BIGTOP-1128.1.patch
>
>
> The mahout smokes have alot of dependencies
> Concretely, we need to fix  the movie lens sample data which has moved.... 
> from http://www.grouplens.org/system/files/ml-1m.zip
> to http://files.grouplens.org/papers/ml-1m.zip
> Otherwise mahout smokes break for obvious reasons. 
> More generally, consolidating and verifying these download URLs in a separate function might make for simpler debugging of the tests, otherwise, you get html documents stored as .zip files, which causes a very hard to interpret error in the tests, i.e. you get an exception about how the zip file isnt formatted correctly.  
> Other Thoughts on how to simplify and isolate moving parts of mahout tests?
> We can bundle them into a patch.  Would be a shame if the only thing this JIRA resulted in was a fix to a single URL :).... 



--
This message was sent by Atlassian JIRA
(v6.1#6144)