You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by "Markus Weimer (JIRA)" <ji...@apache.org> on 2017/04/11 22:41:41 UTC

[jira] [Commented] (REEF-1771) Enable IDistributedDataSet in .NET for Parquet files

    [ https://issues.apache.org/jira/browse/REEF-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965066#comment-15965066 ] 

Markus Weimer commented on REEF-1771:
-------------------------------------

One question I have is how to deal with partitioning *within* a Parquet file. Ideally, we'd want to be able to use one Parquet file, and then have individual tasks process 1 or more partitions inside. Is this something considered here?

> Enable IDistributedDataSet in .NET for Parquet files
> ----------------------------------------------------
>
>                 Key: REEF-1771
>                 URL: https://issues.apache.org/jira/browse/REEF-1771
>             Project: REEF
>          Issue Type: New Feature
>          Components: REEF
>    Affects Versions: 0.16
>            Reporter: Shouheng Yi
>            Priority: Minor
>
> We want to enable REEF to ingest distributed parquet files and represent them as {{IDistributedDataSet}} in .Net. This allow us to better integrate with existing machine learning frameworks that heavily rely on Parquet format.
> Initial step has been taken in [REEF-1765]. Future works will be sub items toward this goal.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)