You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Raj Hadoop <ha...@yahoo.com> on 2013/11/13 16:13:50 UTC

Compression for a HDFS text file - Hive External Partition Table

Hi ,
  
1)      My requirement is to load a file ( a tar.gz file which has multiple tab separated values files and one file is the main file which has huge data – about 10 GB per day) to an externally partitioned hive table.
 
2)      What I am doing is I have automated the process by extracting the tar.gz file and get the main data file (10GB text file) and then loading to a hdfs file as text file.
 
3)      I want to compress the files. What is the procedure for it?
 
4)      Do I need to use any utility to compress the hit data file before loading to HDFS? And also should I define an Input Structure for HDFS File format through a Java Program?
 
Regards,
Raj