You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bigtop.apache.org by "jay vyas (JIRA)" <ji...@apache.org> on 2014/08/25 04:18:57 UTC
[jira] [Comment Edited] (BIGTOP-1414) Add Apache Spark implementation to BigPetStore

    [ https://issues.apache.org/jira/browse/BIGTOP-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108678#comment-14108678 ] 

jay vyas edited comment on BIGTOP-1414 at 8/25/14 2:18 AM:
-----------------------------------------------------------

hi again [~jornfranke] ...

- the arch.dot file is in {{bigtop-bigpetstore}} ... once you open it in graphviz or http://sandbox.kidstrythisathome.com/erdos/index.html, it will be obvious to you what the next steps will be , most likely.
- there is a build.gradle file, which already has scala and java support in it.  in fact, some of the existing bigpetstore code relies on scala, and you can easily find the scala class therein.  

So you will want to modify that build.gradle to include a spark maven dependency, and then write spark classes as you normally would in any other app.


was (Author: jayunit100):
hi .  

- the arch.dot file is in {{bigtop-bigpetstore}}
- there is a build.gradle file, which already has scala and java support in it.  in fact, some of the existing bigpetstore code relies on scala, and you can easily find the scala class therein.  

So you will want to modify that build.gradle to include a spark maven dependency, and then write spark classes as you normally would in any other app.

> Add Apache Spark implementation to BigPetStore
> ----------------------------------------------
>
>                 Key: BIGTOP-1414
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-1414
>             Project: Bigtop
>          Issue Type: Improvement
>          Components: blueprints
>    Affects Versions: backlog
>            Reporter: jay vyas
>             Fix For: 0.9.0
>
>
> Currently we only process data with hadoop.  Now its time to add spark to the bigpetstore application.  This will basically demonstrate the difference between a mapreduce based hadoop implementation of a big data app, versus a Spark one.   
> *We will need to*
> - update graphviz arch.dot to diagram spark as a new path.
> - Adding a spark job to the existing code, in a new package., which uses existing scala based generator, however, we will use it inside  a spark job, rather than in a hadoop inputsplit.
> - The job should output to an RDD, which can then be serialized to disk, or else, fed into the next spark job... 
> *So, the next spark job should*
> - group the data and write product summaries to a local file
> - run a product recommender against the input data set.
> We want the jobs to be runnable as modular, or as a single job, to leverage the RDD paradigm.  
> So it will be interesting to see how the code is architected.    Lets start the planning in this JIRA.  I have some stuff ive informally hacked together, maybe i can attach an initial patch just to start a dialog. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)