You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Mapred Learn <ma...@gmail.com> on 2011/02/17 01:24:27 UTC

hadoop fs -put vs writing text files to hadoop as sequence files

Hi,
I have to upload some terabytes of data that is text files.

What would be good option to do so:

i) using hadoop fs -put to copy text files directly on hdfs.

ii) copying text files as sequence files on hdfs ? What would be extra time
in this case as opposed to (i).

Thanks,
Jimmy

Re: hadoop fs -put vs writing text files to hadoop as sequence files

Posted by Chase Bradford <ch...@gmail.com>.
We use sequence files for storing text data, and you definitely notice the cost of compressing client side while streaming to hdfs.  if I remember correctly, it took about 10x.  That drove us to using writer treads that fed off a single input stream a few thousand lines at a time, and wrote to a hdfs directory with the desired name.

On Feb 16, 2011, at 4:24 PM, Mapred Learn <ma...@gmail.com> wrote:

> Hi,
> I have to upload some terabytes of data that is text files.
>  
> What would be good option to do so:
>  
> i) using hadoop fs -put to copy text files directly on hdfs.
>  
> ii) copying text files as sequence files on hdfs ? What would be extra time in this case as opposed to (i).
>  
> Thanks,
> Jimmy