You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by "Dhruv Mahajan (JIRA)" <ji...@apache.org> on 2015/10/23 19:17:27 UTC

[jira] [Commented] (REEF-868) REEF.IO currently does not provide an interface or functionality for evaluators to write output

    [ https://issues.apache.org/jira/browse/REEF-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971390#comment-14971390 ] 

Dhruv Mahajan commented on REEF-868:
------------------------------------

Markus
======

Beysim has done some work for this in the CDR effort. The idea was to have infrastructure code in charge of all memory allocations. The `Task` gets e.g. an `IOutputReceiver` injected which has methods to create a new output stream for a partition and then write to it. The motivation for this approach was to facilitate efficient memory re-use between writes and to avoid unneeded data copies.

Regarding the below, I’m not a huge fan of option 2. Mixing reading and writing interfaces in the same one leads to terrible APIs like the .NET Stream classes: Decisions that the type system should enforce are put in the hands of API users, which are bound to get it wrong. Also, it precludes many static optimizations we might want to do down the line knowing that some Type is a sink or source of data.

To me, a design following options (1) and (2) below makes the most sense (right now):

•	Add `Input` to the names of all the current interfaces, e.g. `IPartionedInputDataSet`.
•	Make similar interfaces for output, e.g. `IOutputPartition<T>`
•	Maybe have them share a base interface, if there is anything in common.

`IOutputPartition` would have a method `T Create()` or such which creates a new T (e.g. a Stream) that the task can then write to. What I am not clear on is: Who decides the partition ID? Shall it be the Task? Or is it decided by the Driver?


> REEF.IO currently does not provide an interface or functionality for evaluators to write output
> -----------------------------------------------------------------------------------------------
>
>                 Key: REEF-868
>                 URL: https://issues.apache.org/jira/browse/REEF-868
>             Project: REEF
>          Issue Type: New Feature
>          Components: REEF.NET IO
>    Affects Versions: 0.14
>         Environment: C#
>            Reporter: Dhruv Mahajan
>            Assignee: Dhruv Mahajan
>             Fix For: 0.14
>
>
> Currently in REEF.IO for data input we have interface like IPartitionedDataSet, IPartition etc. We do not have any interface like that for output. The aim of this jira is to introduce this interface along with the File partition based implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)