You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by "Markus Weimer (JIRA)" <ji...@apache.org> on 2017/04/11 22:41:41 UTC
[jira] [Commented] (REEF-1771) Enable IDistributedDataSet in .NET
for Parquet files
[ https://issues.apache.org/jira/browse/REEF-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965066#comment-15965066 ]
Markus Weimer commented on REEF-1771:
-------------------------------------
One question I have is how to deal with partitioning *within* a Parquet file. Ideally, we'd want to be able to use one Parquet file, and then have individual tasks process 1 or more partitions inside. Is this something considered here?
> Enable IDistributedDataSet in .NET for Parquet files
> ----------------------------------------------------
>
> Key: REEF-1771
> URL: https://issues.apache.org/jira/browse/REEF-1771
> Project: REEF
> Issue Type: New Feature
> Components: REEF
> Affects Versions: 0.16
> Reporter: Shouheng Yi
> Priority: Minor
>
> We want to enable REEF to ingest distributed parquet files and represent them as {{IDistributedDataSet}} in .Net. This allow us to better integrate with existing machine learning frameworks that heavily rely on Parquet format.
> Initial step has been taken in [REEF-1765]. Future works will be sub items toward this goal.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)