You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@reef.apache.org by "Markus Weimer (JIRA)" <ji...@apache.org> on 2015/10/01 02:02:04 UTC

[jira] [Commented] (REEF-538) Implement IPartitionedDataSet on IFileSystem

    [ https://issues.apache.org/jira/browse/REEF-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939083#comment-14939083 ] 

Markus Weimer commented on REEF-538:
------------------------------------

One likely extension point here is the deserialization of the individual files. We should design the API for this in a way where that part can be user supplied. Similar to how {{InputFormat}} and {{RecordReader}} interact in Hadoop. User code would have to provide this deserializer.

> Implement IPartitionedDataSet on IFileSystem
> --------------------------------------------
>
>                 Key: REEF-538
>                 URL: https://issues.apache.org/jira/browse/REEF-538
>             Project: REEF
>          Issue Type: New Feature
>          Components: REEF.NET
>            Reporter: Markus Weimer
>
> The goal is to build an {{IPartionedDataSet}} where each partition is backed by one or more files stored in on an {{IFileSystem}}. The whole dataset is specified as a set of such files and / or path expressions with {{*}} and such.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)