You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mrql.apache.org by "Leonidas Fegaras (JIRA)" <ji...@apache.org> on 2013/07/21 09:54:48 UTC

[jira] [Created] (MRQL-12) Support query evaluation in Spark mode

Leonidas Fegaras created MRQL-12:
------------------------------------

             Summary: Support query evaluation in Spark mode
                 Key: MRQL-12
                 URL: https://issues.apache.org/jira/browse/MRQL-12
             Project: MRQL
          Issue Type: Improvement
          Components: Run-Time Data
    Affects Versions: 0.9.0
         Environment: Apache Spark http://spark-project.org/
            Reporter: Leonidas Fegaras
            Assignee: Leonidas Fegaras


Spark provides primitives for in-memory cluster computing (http://spark-project.org/). It has been developed at UC Berkeley and has recently accepted as an ASF incubating project. It has already attracted many developers and I think it will play a major role in the hadoop ecosystem. So, I thought it will be nice to be able to evaluate MRQL queries in a Spark cluster. Spark already supports Hive (called Shark). Like Hama, Spark can evaluate queries in memory but unlike Hama, it supports full fault-tolerance. I have already written all the code but I have only tested it in local mode (on a single multi-core node). This task turned out to be easier than I thought because MRQL plans are similar to Spark operations. The only annoyance was that I had to make all data structures Serializable. I also had to include the Gen source code (the Java preprocessor), with ASF licence, which will make the transition to maven easier.
I am attaching the patch below. The actual code that contains the Spark evaluator is the file Evaluator.gen which is attached separately. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira