You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bigtop.apache.org by "jay vyas (JIRA)" <ji...@apache.org> on 2013/11/14 20:17:20 UTC

[jira] [Updated] (BIGTOP-1128) FIX and modularize mahout sample data sets

     [ https://issues.apache.org/jira/browse/BIGTOP-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

jay vyas updated BIGTOP-1128:
-----------------------------

    Attachment: BIGTOP-1128.1.patch

This patch 
1) Fixes the movie lens url  which was recently moved 
2) Modularizes all file downloads to single function so they are easy to debug.
3) Adds alot of necessary comments to tests
4) Reduces test time for movie lenst by only running 2 iterations instead of 
5)  Also parameterizes iterations in a variable that can be easily locally edited in the groovy script. (next iteration maybe add configuration support for all this so mahout smokes can run faster, i.e. for the clustering jobs)

> FIX and modularize mahout sample data sets
> ------------------------------------------
>
>                 Key: BIGTOP-1128
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-1128
>             Project: Bigtop
>          Issue Type: Bug
>            Reporter: jay vyas
>         Attachments: BIGTOP-1128.1.patch
>
>
> The mahout smokes have alot of dependencies
> Concretely, we need to fix  the movie lens sample data which has moved.... 
> from http://www.grouplens.org/system/files/ml-1m.zip
> to http://files.grouplens.org/papers/ml-1m.zip
> Otherwise mahout smokes break for obvious reasons. 
> More generally, consolidating and verifying these download URLs in a separate function might make for simpler debugging of the tests, otherwise, you get html documents stored as .zip files, which causes a very hard to interpret error in the tests, i.e. you get an exception about how the zip file isnt formatted correctly.  
> Other Thoughts on how to simplify and isolate moving parts of mahout tests?
> We can bundle them into a patch.  Would be a shame if the only thing this JIRA resulted in was a fix to a single URL :).... 



--
This message was sent by Atlassian JIRA
(v6.1#6144)