You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/04/01 20:02:00 UTC

[jira] [Commented] (AVRO-2354) Add CombineAvroKeyValueFileInputFormat in avro-mapred to combine small avro keyvalue files into combineSplit

    [ https://issues.apache.org/jira/browse/AVRO-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807134#comment-16807134 ] 

ASF subversion and git services commented on AVRO-2354:
-------------------------------------------------------

Commit 7cd66387b7a249eabcfbd57fb651613d9eb1b8c4 in avro's branch refs/heads/master from suxingfate
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=7cd6638 ]

AVRO-2354: Add CombineAvroKeyValueFileInputFormat in avro-mapred to combine small avro keyvalue files into combineSplit


> Add CombineAvroKeyValueFileInputFormat in avro-mapred to combine small avro keyvalue files into combineSplit
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-2354
>                 URL: https://issues.apache.org/jira/browse/AVRO-2354
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Wang, Xinglong
>            Priority: Minor
>
> In our production env, we generate avro files to track some user behavior events. Every hour, we will have several avro files created. And daily, we will run MR to do analysis, when using AvroKeyValueInputFormat, a lot of small mappers started due to we have small avro files. 
> A combine file inputformat will be very helpful for such case. 
> Hadoop already provided some implementation for sequencefile and text file. This Jira is propose a CombineAvroKeyValueFileInputFormat class to implement the same for avro keyvalue files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)