You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by "Markus Weimer (JIRA)" <ji...@apache.org> on 2017/11/09 23:27:00 UTC

[jira] [Assigned] (REEF-424) Add Iterative Map-Reduce-Update

     [ https://issues.apache.org/jira/browse/REEF-424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Markus Weimer reassigned REEF-424:
----------------------------------

    Assignee:     (was: Markus Weimer)

> Add Iterative Map-Reduce-Update
> -------------------------------
>
>                 Key: REEF-424
>                 URL: https://issues.apache.org/jira/browse/REEF-424
>             Project: REEF
>          Issue Type: New Feature
>          Components: IMRU, REEF.NET
>            Reporter: Markus Weimer
>
> Many popular machine learning algorithms can be expressed in what's known as the statistical query model (SQM): They rely on aggregate statistics, not random data access. In the most common case, those statistics are aggregates of functions applied to each dataset. Such queries map trivially to the map-reduce programming paradigm.
> However, most ML algorithms perform many of such queries in iterations. This leads to inefficiencies on traditional map-reduce systems: Ech query turns into a job which needs to be scheduled, its input needs to be read and its output needs to be persisted.
> We propose Iterative Map Reduce Update (IMRU), a simple extension to the map-reduce abstraction to capture such programs in three functions:
>   * {{TMapOutput Map(TMapInput input)}} is a map function with side information. It is assumed to have access to the training data through other means, and the {{input}} provided is the mutable state of the computation provided by the {{Update}} function.
>   * {{TMapOutput Reduce(param TMapOutput[] mapoutput}} is a (pure) reduce function.
>   * {{Tuple<TMapInput,TResult> Update(TMapOutput mapoutput)}} takes the (aggregated) outputs from the Map functions and produces a new set of inputs for them, a result of the computation or both. Computation terminates if no further {{TMapInput}} is produced.
> As part of this work, we will introduce the IMRU API, a local (threaded) test harness as well as an implementation on top of REEF. Actually getting the data into the mappers is out of scope here and will be part of another JIRA.
> This JIRA serves as an umbrella for work leading to an IMRU implementation on REEF.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)