You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Chris Douglas (JIRA)" <ji...@apache.org> on 2007/12/14 01:21:43 UTC
[jira] Updated: (HADOOP-2424) lzop compatible CompressionCodec

     [ https://issues.apache.org/jira/browse/HADOOP-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-2424:
----------------------------------

    Description: 
LzoCodec currently outputs at most {{io.compression.codec.lzo.buffersize}} (default 64k)- less the compression overhead- bytes per write (HADOOP-2402) in the following format:

{noformat}
[uncompressed block length(32)]
[compressed block length(32)]
[compressed block]
{noformat}

lzop (lzo-backed command-line utility) writes blocks in the following format:

{noformat}
[uncompressed block length(32)]
[compressed block length (32)]
[Adler-32|CRC-32 checksum of uncompressed block (32)]
[Adler-32|CRC-32 checksum of compressed block (32)]
[compressed block]
{noformat}

There's an additional ~32 byte header to the file. I don't know of a standard, but the lzop source should suffice.

Since we're using ".lzo" as the default extension, it's worth considering being compatible with lzop, but not necessarily for all lzo-compressed blocks. For example, SequenceFiles should use the existing LzoCodec format.

  was:
LzoCodec currently outputs at most {{io.compression.codec.lzo.buffersize}} (default 64k)- less the compression overhead- bytes per write (HADOOP-2402) in the following format:

{noformat}
[compressed block length(32)]
[compressed block]
{noformat}

lzop (lzo-backed command-line utility) writes blocks in the following format:

{noformat}
[uncompressed block length(32)]
[compressed block length (32)]
[Adler-32|CRC-32 checksum of uncompressed block (32)]
[Adler-32|CRC-32 checksum of compressed block (32)]
[compressed block]
{noformat}

There's an additional ~32 byte header to the file. I don't know of a standard, but the lzop source should suffice.

Since we're using ".lzo" as the default extension, it's worth considering being compatible with lzop, but not necessarily for all lzo-compressed blocks. For example, SequenceFiles should use the existing LzoCodec format.


> lzop compatible CompressionCodec
> --------------------------------
>
>                 Key: HADOOP-2424
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2424
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: io, native
>            Reporter: Chris Douglas
>
> LzoCodec currently outputs at most {{io.compression.codec.lzo.buffersize}} (default 64k)- less the compression overhead- bytes per write (HADOOP-2402) in the following format:
> {noformat}
> [uncompressed block length(32)]
> [compressed block length(32)]
> [compressed block]
> {noformat}
> lzop (lzo-backed command-line utility) writes blocks in the following format:
> {noformat}
> [uncompressed block length(32)]
> [compressed block length (32)]
> [Adler-32|CRC-32 checksum of uncompressed block (32)]
> [Adler-32|CRC-32 checksum of compressed block (32)]
> [compressed block]
> {noformat}
> There's an additional ~32 byte header to the file. I don't know of a standard, but the lzop source should suffice.
> Since we're using ".lzo" as the default extension, it's worth considering being compatible with lzop, but not necessarily for all lzo-compressed blocks. For example, SequenceFiles should use the existing LzoCodec format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.