You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by maha <ma...@umail.ucsb.edu> on 2011/03/05 04:33:13 UTC
Using SequenceFile instead of TextFiles
Hi,
I have 2 questions:
1) Is a SequenceFile more efficient than TextFiles for input? ... I think TextFiles will be processed by TextInputFormat into sequenceFiles inside hadoop. So will SequenceFiles (ie.binary input Files) be more efficient ?
2) If I decided to use SequenceFiles as InputFormat, Do I need to stick to the header protocol defined in http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.html ?
Thanks everyone,
Maha
Re: Using SequenceFile instead of TextFiles
Posted by maha <ma...@umail.ucsb.edu>.
Thanks again Harsh, I actually got the book 2 days ago, but didn't have time to read it yet.
Maha
On Mar 4, 2011, at 7:54 PM, Harsh J wrote:
> Hi,
>
> On Sat, Mar 5, 2011 at 9:03 AM, maha <ma...@umail.ucsb.edu> wrote:
>> Hi,
>>
>> I have 2 questions:
>>
>> 1) Is a SequenceFile more efficient than TextFiles for input? ... I think TextFiles will be processed by TextInputFormat into sequenceFiles inside hadoop. So will SequenceFiles (ie.binary input Files) be more efficient ?
>
> Depends on what your scenario is.
>
>> 2) If I decided to use SequenceFiles as InputFormat, Do I need to stick to the header protocol defined in http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.html ?
>
> No. You would use SequenceFileInputFormat and SequenceFileOutputFormat classes.
>
> May I suggest reading a good Hadoop book that covers the little,
> scattered stuff like this, neatly? I like Tom White's Hadoop: The
> Definitive Guide :)
>
> --
> Harsh J
> www.harshj.com
Re: Using SequenceFile instead of TextFiles
Posted by Harsh J <qw...@gmail.com>.
Hi,
On Sat, Mar 5, 2011 at 9:03 AM, maha <ma...@umail.ucsb.edu> wrote:
> Hi,
>
> I have 2 questions:
>
> 1) Is a  SequenceFile more efficient than TextFiles for input?  ... I think TextFiles will be processed by TextInputFormat into sequenceFiles inside hadoop. So will SequenceFiles (ie.binary input Files) be more efficient ?
Depends on what your scenario is.
> 2) If I decided to use SequenceFiles as InputFormat, Do I need to stick to the header protocol defined in http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.html ?
No. You would use SequenceFileInputFormat and SequenceFileOutputFormat classes.
May I suggest reading a good Hadoop book that covers the little,
scattered stuff like this, neatly? I like Tom White's Hadoop: The
Definitive Guide :)
--
Harsh J
www.harshj.com