You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bigtop.apache.org by "jay vyas (JIRA)" <ji...@apache.org> on 2014/04/25 04:40:18 UTC

[jira] [Resolved] (BIGTOP-1212) Pick or build a framework for building fake data sets

     [ https://issues.apache.org/jira/browse/BIGTOP-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

jay vyas resolved BIGTOP-1212.
------------------------------

    Resolution: Fixed

We are using BigPetStores data set generator currently for the purpose of generating a rich data set of arbitrary size.  

The BigPetStore TransactionInputFormat can be modified to generate other types of data if needed.   

Maybe we can have a broader data set generator in the future, or convert that into a generic framework for producing fake data sets.

We can open another JIRA to take the custom input splits in bigpetstore and make them more generic, possibly, if we have interest in doing so.  

> Pick or build a framework for building fake data sets
> -----------------------------------------------------
>
>                 Key: BIGTOP-1212
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-1212
>             Project: Bigtop
>          Issue Type: New Feature
>          Components: Blueprints
>    Affects Versions: 0.7.0
>            Reporter: jay vyas
>             Fix For: 0.8.0
>
>
> - We've already seen that the mahout smoke tests are fragile with respect to requiring many external input data sets. 
> - Also in BigPetStore BIGTOP-1089 , we are building custom fake data generators so that we can build arbitrarily large data sets of customer transactions with patterns in them. 
> So -- lest either (1) build a framework or (2) adopt one, that is modular enough to extend for different smoke test scenarios.   
> ADVANTAGES:
> - VM tests can run the exact same smokes that real tests run , and just generate smaller input data sets.  Right now, we cant do this with static external data sets .
> - We can start eliminating fragile external dependencies of smoke tests (i.e. the mahout ones), and replace them with  own data sets on the fly, no need for wgetting them from 3rd parties 
> - BigPetStore can focus on demo'ing the bigtop based hadoop ecosystem deployment, rather than on generating data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)