You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metamodel.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/08/07 09:51:46 UTC

[jira] [Commented] (METAMODEL-163) Composite/directory Resource for local files and HDFS files

    [ https://issues.apache.org/jira/browse/METAMODEL-163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661460#comment-14661460 ] 

ASF GitHub Bot commented on METAMODEL-163:
------------------------------------------

GitHub user LosD opened a pull request:

    https://github.com/apache/metamodel/pull/37

    Adds ability to read folders from FileResource and HdfsResource

    This will allow FileResource and HdfsResource to use a folder as a complete resource. It will skip any subfolder and only use files in the folder. Before opening files, they will be sorted them alphabetically by name (actually it uses the natural ordering of the File/FileStatus object, but that is the pathname/url sorted alphabetically).
    
    Fixes METAMODEL-163

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/LosD/metamodel feature/METAMODEL-163-folder-resource

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metamodel/pull/37.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #37
    
----
commit f11f9607055ce931601cf417d9de009461efddcf
Author: Dennis Du Krøger <de...@humaninference.com>
Date:   2015-08-07T07:45:15Z

    Adds ability to read folders from FileResource and HdfsResource
    
    This will allow FileResource and HdfsResource to use a folder as a complete resource. It will skip any subfolder and only use files in the folder. Before opening files, they will be sorted them alphabetically by name (actually it uses the natural ordering of the File/FileStatus object, but that is the pathname/url sorted alphabetically).

----


> Composite/directory Resource for local files and HDFS files
> -----------------------------------------------------------
>
>                 Key: METAMODEL-163
>                 URL: https://issues.apache.org/jira/browse/METAMODEL-163
>             Project: Apache MetaModel
>          Issue Type: Improvement
>            Reporter: Kasper Sørensen
>
> A more and more common pattern in representing data is to have a directory with files of the same format which can be appended together to form a complete dataset. I see this especially in Hadoop scenarios where reducers as well as spark usually will create such "part" files in a directory and treat that directory almost as a logical file.
> I don't know if we can generalize this or if we need two separate implementations. But at least I would love to have a Resource implementation like this: Given a (local or HDFS) path that points to a directory, or maybe also to a wildcard-enabled expression, I would want to have a single Resource object that represents all the corresponding files in that directory/pattern.
> This would not only provide us with better interoperability with Hadoop result data, but it will also actually solve a long-standing request (in our company at least) to support multiple CSV files in one logical CsvDataContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)