You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Kiyan Ahmadizadeh (JIRA)" <ji...@apache.org> on 2012/07/10 03:24:33 UTC

[jira] [Updated] (CRUNCH-9) Add support for launching Scrunch pipelines from a REPL

     [ https://issues.apache.org/jira/browse/CRUNCH-9?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kiyan Ahmadizadeh updated CRUNCH-9:
-----------------------------------

    Attachment: CRUNCH-9.patch

    This commit modifies the Scrunch project so that Scrunch jobs can be run from
    a Scala REPL.  Users can run a Scala REPL capable of launching Scrunch jobs by
    building Scrunch using `mvn package` and running bin/scrunch from the
    distribution directory that results. Several changes have been made to the
    project to accomplish this:
    
    1. The project has been modified to produce a release distribution. The
    distribution is created by maven when `mvn package` is run. A distribution
    folder and tarball are created. The distribution folder contains a bin dir that
    contains scripts, a lib dir that contains all library jars, and a log dir that
    contains a log4j configuration file.
    
    2. A modified Scala REPL was added to the project. An object InterpreterRunner
    was created that launches a Scala REPL.  It's a modification of Scala's
    MainGenericRunner.  The new Scrunch version allows client code to determine if a
    REPL is actually running, and includes methods for creating a jar from the code
    compiled from REPL input.  A script named "scrunch" was added to the project
    that, when run, launches this modified Scala REPL.  The script is a modification
    of the script distributed with Scala that launches the Scala REPL.
    
    3. Scrunch's Pipeline class was modified so that any MapReduce pipeline
    constructed automatically adds the Scrunch lib jars to the Distributed Cache of
    the job and to the classpaths of run tasks.
    
    4. Methods on PCollection/PTable/etc. that result in a job being launched were
    modified to check if the REPL is running and, if so, create a jar of code
    compiled from REPL input and ship that jar with the job so that it's on the
    classpath of run tasks.
    
    5. To facilitate extensions, From/To/At objects were changed to traits, with
    likewise named singleton objects that extend the traits created.
    
    6. The examples in the examples directory, and the script scrunch.py for running
    those examples, are included in the project distribution.  The scrunch.py script
    was renamed to scrunch-job.py and modified to cope with the new project
    distribution structure and take advantage of the fact that Scrunch lib jars are
    now automatically added to the classpath of run jobs.
    
    I started an integration test for actually launching jobs but the MiniMRCluster
    testing framework does not behave properly when jars are added to the
    distributed cache.  The problem is related to MAPREDUCE-2884. I have verified
    that jobs can be launched from the REPL using an actual cluster.

                
> Add support for launching Scrunch pipelines from a REPL
> -------------------------------------------------------
>
>                 Key: CRUNCH-9
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-9
>             Project: Crunch
>          Issue Type: New Feature
>          Components: Scrunch
>            Reporter: Josh Wills
>         Attachments: CRUNCH-9.patch
>
>
> It would be really, really cool and useful to be able to launch a Scrunch pipeline from a Scala-based REPL, which was one of the killer apps for Cascade, Google's Scala-based wrapper around FlumeJava.
> See the video from Scala Days 2011 for a reference: http://days2011.scala-lang.org/node/138/282

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira