You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@crunch.apache.org by "Josh Wills (JIRA)" <ji...@apache.org> on 2013/08/10 01:32:47 UTC

[jira] [Updated] (CRUNCH-165) Pipelines should automatically use CombineFileInputFormat where input consists of many small files

     [ https://issues.apache.org/jira/browse/CRUNCH-165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Wills updated CRUNCH-165:
------------------------------

    Attachment: CRUNCH-165-v3.patch

I took yet another crack at this today, and I think I got it this time. :)
                
> Pipelines should automatically use CombineFileInputFormat where input consists of many small files
> --------------------------------------------------------------------------------------------------
>
>                 Key: CRUNCH-165
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-165
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.4.0
>            Reporter: Dave Beech
>            Assignee: Josh Wills
>         Attachments: CRUNCH-165-jwills.patch, CRUNCH-165.patch, CRUNCH-165-v3.patch
>
>
> Hive had a feature introduced in HIVE-74 whereby CombineFileInputFormat would be used if the input data consisted of many small files, making the resulting mapreduce jobs more efficient by giving individual mappers more data to process. This would be a nice feature for Crunch to have, too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira