You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Mapred Learn <ma...@gmail.com> on 2011/02/17 01:24:27 UTC
hadoop fs -put vs writing text files to hadoop as sequence files
Hi,
I have to upload some terabytes of data that is text files.
What would be good option to do so:
i) using hadoop fs -put to copy text files directly on hdfs.
ii) copying text files as sequence files on hdfs ? What would be extra time
in this case as opposed to (i).
Thanks,
Jimmy
Re: hadoop fs -put vs writing text files to hadoop as sequence files
Posted by Chase Bradford <ch...@gmail.com>.
We use sequence files for storing text data, and you definitely notice the cost of compressing client side while streaming to hdfs. if I remember correctly, it took about 10x. That drove us to using writer treads that fed off a single input stream a few thousand lines at a time, and wrote to a hdfs directory with the desired name.
On Feb 16, 2011, at 4:24 PM, Mapred Learn <ma...@gmail.com> wrote:
> Hi,
> I have to upload some terabytes of data that is text files.
>
> What would be good option to do so:
>
> i) using hadoop fs -put to copy text files directly on hdfs.
>
> ii) copying text files as sequence files on hdfs ? What would be extra time in this case as opposed to (i).
>
> Thanks,
> Jimmy