You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bigtop.apache.org by "jay vyas (JIRA)" <ji...@apache.org> on 2014/02/16 04:24:19 UTC

[jira] [Created] (BIGTOP-1212) Pick or build a framework for building fake data sets

jay vyas created BIGTOP-1212:
--------------------------------

             Summary: Pick or build a framework for building fake data sets
                 Key: BIGTOP-1212
                 URL: https://issues.apache.org/jira/browse/BIGTOP-1212
             Project: Bigtop
          Issue Type: New Feature
            Reporter: jay vyas


- We've already seen that the mahout smoke tests are fragile with respect to requiring many external input data sets. 
- Also in BigPetStore BIGTOP-1089 , we are building custom fake data generators so that we can build arbitrarily large data sets of customer transactions with patterns in them. 

So -- lest either (1) build a framework or (2) adopt one, that is modular enough to extend for different smoke test scenarios.   

ADVANTAGES:

- VM tests can run the exact same smokes that real tests run , and just generate smaller input data sets.  Right now, we cant do this with static external data sets .
- We can start eliminating fragile external dependencies of smoke tests (i.e. the mahout ones), and replace them with  own data sets on the fly, no need for wgetting them from 3rd parties 
- BigPetStore can focus on demo'ing the bigtop based hadoop ecosystem deployment, rather than on generating data.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)