You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Chagarlamudi, Prasanth" <pr...@epsilon.com> on 2015/06/16 23:05:58 UTC

Merging small files in partitions

Hello,
I am looking for an optimized way to merge small files in hive partitions into one big file.
I came across Alter Table/Partition Concatenate https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionConcatenate. Doc says this only works for RCFiles. I wish there is something similar for TEXT FILE format.
Any suggestions?

Thanks in advance
Prasanth



________________________________

This e-mail and files transmitted with it are confidential, and are intended solely for the use of the individual or entity to whom this e-mail is addressed. If you are not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you are not one of the named recipient(s) or otherwise have reason to believe that you received this message in error, please immediately notify sender by e-mail, and destroy the original message. Thank You.

Re: Merging small files in partitions

Posted by Mohammad Islam <mi...@yahoo.com>.
Hi Edward,Can we do the same/similar thing for parquet file?Any pointer?Regards,Mohammad 


     On Tuesday, June 16, 2015 2:35 PM, Edward Capriolo <ed...@gmail.com> wrote:
   

 https://github.com/edwardcapriolo/filecrush

On Tue, Jun 16, 2015 at 5:05 PM, Chagarlamudi, Prasanth <pr...@epsilon.com> wrote:

Hello,I am looking for an optimized way to merge small files in hive partitions into one big file.I came across Alter Table/Partition Concatenate https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionConcatenate. Doc says this only works for RCFiles. I wish there is something similar for TEXT FILE format.Any suggestions? Thanks in advancePrasanth  

This e-mail and files transmitted with it are confidential, and are intended solely for the use of the individual or entity to whom this e-mail is addressed. If you are not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you are not one of the named recipient(s) or otherwise have reason to believe that you received this message in error, please immediately notify sender by e-mail, and destroy the original message. Thank You.




  

Re: Merging small files in partitions

Posted by Edward Capriolo <ed...@gmail.com>.
https://github.com/edwardcapriolo/filecrush

On Tue, Jun 16, 2015 at 5:05 PM, Chagarlamudi, Prasanth <
prasanth.chagarlamudi@epsilon.com> wrote:

>  Hello,
>
> I am looking for an optimized way to merge small files in hive partitions
> into one big file.
>
> I came across *Alter Table/Partition Concatenate *
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionConcatenate.
> Doc says this only works for RCFiles. I wish there is something similar for
> TEXT FILE format.
>
> Any suggestions?
>
>
>
> Thanks in advance
>
> Prasanth
>
>
>
>
>
> ------------------------------
>
> This e-mail and files transmitted with it are confidential, and are
> intended solely for the use of the individual or entity to whom this e-mail
> is addressed. If you are not the intended recipient, or the employee or
> agent responsible to deliver it to the intended recipient, you are hereby
> notified that any dissemination, distribution or copying of this
> communication is strictly prohibited. If you are not one of the named
> recipient(s) or otherwise have reason to believe that you received this
> message in error, please immediately notify sender by e-mail, and destroy
> the original message. Thank You.
>