You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@bigtop.apache.org by "jay vyas (JIRA)" <ji...@apache.org> on 2014/11/20 01:44:33 UTC

[jira] [Issue Comment Deleted] (BIGTOP-1535) Add Spark ETL script to BigPetStore

     [ https://issues.apache.org/jira/browse/BIGTOP-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

jay vyas updated BIGTOP-1535:
-----------------------------
    Comment: was deleted

(was: Lets also add in this patch the {{arch.dot}} for the spark pipeline.

Im actually wondering wether you really need spark ETL ? I think MapReduce is great for ETL, and really, the spark components shine at demonstrating in place processing of data, and should focus more on that.

But open to a pure ETL step if you (or others)  think thats a good path forward)

> Add Spark ETL script to BigPetStore
> -----------------------------------
>
>                 Key: BIGTOP-1535
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-1535
>             Project: Bigtop
>          Issue Type: Improvement
>          Components: blueprints
>            Reporter: RJ Nowling
>
> We should add script that reads the results from the data generator and normalizes the data and splits it into separate tables (ETL).  It would be nice to use Spark SQL but it is not required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)