You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Pedro Costa <ps...@gmail.com> on 2011/02/14 16:21:02 UTC

Map output files are SequenceFileFormat

Hi,

1 - The map output files are always of the type SequenceFileFormat?

2 - The means that it contains a header with the following files?
# version - A byte array: 3 bytes of magic header 'SEQ', followed by 1
byte of actual version no. (e.g. SEQ4 or SEQ6)
# keyClassName - String
# valueClassName - String
# compression - A boolean which specifies if compression is turned on
for keys/values in this file.
# blockCompression - A boolean which specifies if block compression is
turned on for keys/values in this file.
# compressor class - The classname of the CompressionCodec which is
used to compress/decompress keys and/or values in this SequenceFile
(if compression is enabled).
# metadata - SequenceFile.Metadata for this file (key/value pairs)
# sync - A sync marker to denote end of the header.



Thanks,

-- 
Pedro

Re: Map output files are SequenceFileFormat

Posted by Harsh J <qw...@gmail.com>.
Hello,

On Mon, Feb 14, 2011 at 11:37 PM, Pedro Costa <ps...@gmail.com> wrote:
> And when the data of the map-intermediate files is compressed, it's
> still an IFile?

Yes. From my understanding, if compression is turned ON for IFile, the
output stream for writing the IFile is itself set as a compressing one
and all data written to the stream is compressed.

In contrast, in SequenceFiles, compression is done in blocks (of a
sizes set upon the Writer creation), and keys are left uncompressed.

-- 
Harsh J
www.harshj.com

Re: Map output files are SequenceFileFormat

Posted by Pedro Costa <ps...@gmail.com>.
And when the data of the map-intermediate files is compressed, it's
still an IFile?

On Mon, Feb 14, 2011 at 4:44 PM, Harsh J <qw...@gmail.com> wrote:
> Hello,
>
> On Mon, Feb 14, 2011 at 8:51 PM, Pedro Costa <ps...@gmail.com> wrote:
>> Hi,
>>
>> 1 - The map output files are always of the type SequenceFileFormat?
>
> If you mean the Map-intermediate files, then no - they're IFiles.
> Otherwise, if your OutputFormat is set to a SequenceFileOutputFormat,
> then yes these type of files would be created.
>
> Map-Reduce intermediate files are of the IFile format. It's not part
> of the public API, but you may read its implementation in
> src/java/org/apache/hadoop/mapred/IFile.java.
>
> SequenceFiles are almost similar, but are built for better K-V file
> operations such as skipping over keys, etc. which is not essentially
> required in case of partitioned-and-sorted-data-containing IFiles.
>
> --
> Harsh J
> www.harshj.com
>



-- 
Pedro

Re: Map output files are SequenceFileFormat

Posted by Harsh J <qw...@gmail.com>.
Hello,

On Mon, Feb 14, 2011 at 8:51 PM, Pedro Costa <ps...@gmail.com> wrote:
> Hi,
>
> 1 - The map output files are always of the type SequenceFileFormat?

If you mean the Map-intermediate files, then no - they're IFiles.
Otherwise, if your OutputFormat is set to a SequenceFileOutputFormat,
then yes these type of files would be created.

Map-Reduce intermediate files are of the IFile format. It's not part
of the public API, but you may read its implementation in
src/java/org/apache/hadoop/mapred/IFile.java.

SequenceFiles are almost similar, but are built for better K-V file
operations such as skipping over keys, etc. which is not essentially
required in case of partitioned-and-sorted-data-containing IFiles.

-- 
Harsh J
www.harshj.com