You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mrql.apache.org by "Leonidas Fegaras (JIRA)" <ji...@apache.org> on 2013/10/28 14:05:31 UTC

[jira] [Updated] (MRQL-25) Changed Translator/Evaluator interface to improve Spark efficiency

     [ https://issues.apache.org/jira/browse/MRQL-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leonidas Fegaras updated MRQL-25:
---------------------------------

    Attachment: MRQL-25.patch

> Changed Translator/Evaluator interface to improve Spark efficiency
> ------------------------------------------------------------------
>
>                 Key: MRQL-25
>                 URL: https://issues.apache.org/jira/browse/MRQL-25
>             Project: MRQL
>          Issue Type: Improvement
>            Reporter: Leonidas Fegaras
>            Assignee: Leonidas Fegaras
>            Priority: Minor
>         Attachments: MRQL-25.patch
>
>
> The following patch extends the old interface between the translator and the three evaluators to make the Spark evaluation more efficient. More specifically, the old interface simply used the method collect over datasets to lazily collect data from a distributed dataset (it returns an Iterator). But there is no public method in Spark to pull data from an RDD in the form of an Iterator. Previously, I had to dump an RDD into a binary file and create a Iterator reader, which was inefficient. Now, in addition to collect, the interface includes take and reduce: take takes the first elements of a dataset and reduce reduces the values of a dataset using an associative accumulator. Some translator methods, such as print and dump to binary or text files, had to be rewritten to use this new interface.



--
This message was sent by Atlassian JIRA
(v6.1#6144)