You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Mix Nin <pi...@gmail.com> on 2013/05/24 23:46:48 UTC

Single Output file from STORE command

PIG STORE command produces multiple output files. I want a single output
file and I tried using command as below

STORE (foreach (group NoNullData all) generate flatten($1))  into 'xxxx';

This command produces one single file but at the same time forces to use
single reducer which kills performance.

How do I overcome the scenario?

Normally   STORE command produces multiple output files, apart from that I
see another file
"_SUCCESS" in output directory. I ma generating metadata file  ( using
PigStorage('\t', '-schema') ) in output directory

I thought of using  getmerge as follows

*hadoop* fs -*getmerge* <dir_of_input_files>   <local file>

But this requires
1)eliminating files other than data files in HDFS directory
2)It creates a single file in local directory but not in HDFS directory
3)I need to again move file from local directory to HDFS directory which
may  take additional time , depending on size of single file
4)I need to agin place the files which I eliminated in Step 1


Is there an efficient way for my problem?

Thanks