You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Jiayu Ji <ji...@gmail.com> on 2013/12/14 05:37:23 UTC

How does mapreduce job determine the compress codec

Hi

I am having this question on how does mapreduce job determine the compress
codec on hdfs. From what I read on the definitive guide (page 86)," the
CompressionCodecFactory provides a way of mapping a filename extension to a
CompressionCodec using its getCodec() method". I did a test with a lzo
compressed file without a lzo extension. However, the mapreduce job was
still able to get the right codec. Does anyone know why? Thanks in advance.

Jiayu

Re: How does mapreduce job determine the compress codec

Posted by Jiayu Ji <ji...@gmail.com>.

Thanks Azurry. That was exactly the thing I want to know.


On Sun, Dec 15, 2013 at 7:53 PM, Azuryy Yu <az...@gmail.com> wrote:

> Hi Jiayu,
> For the Sequence file as an input, CompressCodec class was serialized in
> the file header, then Sequence Filereader will know the compression algo.
> thanks.
>
>
>
>
> On Mon, Dec 16, 2013 at 8:28 AM, Jiayu Ji <ji...@gmail.com> wrote:
>
>> Thanks Tao. I know I can tell it is a lzo file based on the magic number.
>> What I am curious is which class in hadoop used by the mapreduce job to
>> determine the file compression algorithm. At the end of the day, I am
>> trying to figure out whether all the inputs of a mapreduce job have to be
>> compressed with the same algorithm.
>>
>>
>> On Fri, Dec 13, 2013 at 11:16 PM, Tao Xiao <xi...@gmail.com>wrote:
>>
>>> I suggest you download the lzo compressed file, no matter weather it has
>>> a lzo extension as its file name,  and open it in the form of hex bytes
>>> with tools like UltraEdit, and have a look at its heading contents.
>>>
>>>
>>> 2013/12/14 Jiayu Ji <ji...@gmail.com>
>>>
>>>> Hi
>>>>
>>>> I am having this question on how does mapreduce job determine the
>>>> compress codec on hdfs. From what I read on the definitive guide (page
>>>> 86)," the CompressionCodecFactory provides a way of mapping a filename
>>>> extension to a CompressionCodec using its getCodec() method". I did a test
>>>> with a lzo compressed file without a lzo extension. However, the mapreduce
>>>> job was still able to get the right codec. Does anyone know why? Thanks in
>>>> advance.
>>>>
>>>> Jiayu
>>>>
>>>
>>>
>>
>>
>> --
>> Jiayu (James) Ji,
>>
>> Cell: (312)823-7393
>>
>>
>


-- 
Jiayu (James) Ji,

Cell: (312)823-7393

Re: How does mapreduce job determine the compress codec

Posted by Jiayu Ji <ji...@gmail.com>.

Thanks Azurry. That was exactly the thing I want to know.


On Sun, Dec 15, 2013 at 7:53 PM, Azuryy Yu <az...@gmail.com> wrote:

> Hi Jiayu,
> For the Sequence file as an input, CompressCodec class was serialized in
> the file header, then Sequence Filereader will know the compression algo.
> thanks.
>
>
>
>
> On Mon, Dec 16, 2013 at 8:28 AM, Jiayu Ji <ji...@gmail.com> wrote:
>
>> Thanks Tao. I know I can tell it is a lzo file based on the magic number.
>> What I am curious is which class in hadoop used by the mapreduce job to
>> determine the file compression algorithm. At the end of the day, I am
>> trying to figure out whether all the inputs of a mapreduce job have to be
>> compressed with the same algorithm.
>>
>>
>> On Fri, Dec 13, 2013 at 11:16 PM, Tao Xiao <xi...@gmail.com>wrote:
>>
>>> I suggest you download the lzo compressed file, no matter weather it has
>>> a lzo extension as its file name,  and open it in the form of hex bytes
>>> with tools like UltraEdit, and have a look at its heading contents.
>>>
>>>
>>> 2013/12/14 Jiayu Ji <ji...@gmail.com>
>>>
>>>> Hi
>>>>
>>>> I am having this question on how does mapreduce job determine the
>>>> compress codec on hdfs. From what I read on the definitive guide (page
>>>> 86)," the CompressionCodecFactory provides a way of mapping a filename
>>>> extension to a CompressionCodec using its getCodec() method". I did a test
>>>> with a lzo compressed file without a lzo extension. However, the mapreduce
>>>> job was still able to get the right codec. Does anyone know why? Thanks in
>>>> advance.
>>>>
>>>> Jiayu
>>>>
>>>
>>>
>>
>>
>> --
>> Jiayu (James) Ji,
>>
>> Cell: (312)823-7393
>>
>>
>


-- 
Jiayu (James) Ji,

Cell: (312)823-7393

Re: How does mapreduce job determine the compress codec

Posted by Jiayu Ji <ji...@gmail.com>.

Thanks Azurry. That was exactly the thing I want to know.


On Sun, Dec 15, 2013 at 7:53 PM, Azuryy Yu <az...@gmail.com> wrote:

> Hi Jiayu,
> For the Sequence file as an input, CompressCodec class was serialized in
> the file header, then Sequence Filereader will know the compression algo.
> thanks.
>
>
>
>
> On Mon, Dec 16, 2013 at 8:28 AM, Jiayu Ji <ji...@gmail.com> wrote:
>
>> Thanks Tao. I know I can tell it is a lzo file based on the magic number.
>> What I am curious is which class in hadoop used by the mapreduce job to
>> determine the file compression algorithm. At the end of the day, I am
>> trying to figure out whether all the inputs of a mapreduce job have to be
>> compressed with the same algorithm.
>>
>>
>> On Fri, Dec 13, 2013 at 11:16 PM, Tao Xiao <xi...@gmail.com>wrote:
>>
>>> I suggest you download the lzo compressed file, no matter weather it has
>>> a lzo extension as its file name,  and open it in the form of hex bytes
>>> with tools like UltraEdit, and have a look at its heading contents.
>>>
>>>
>>> 2013/12/14 Jiayu Ji <ji...@gmail.com>
>>>
>>>> Hi
>>>>
>>>> I am having this question on how does mapreduce job determine the
>>>> compress codec on hdfs. From what I read on the definitive guide (page
>>>> 86)," the CompressionCodecFactory provides a way of mapping a filename
>>>> extension to a CompressionCodec using its getCodec() method". I did a test
>>>> with a lzo compressed file without a lzo extension. However, the mapreduce
>>>> job was still able to get the right codec. Does anyone know why? Thanks in
>>>> advance.
>>>>
>>>> Jiayu
>>>>
>>>
>>>
>>
>>
>> --
>> Jiayu (James) Ji,
>>
>> Cell: (312)823-7393
>>
>>
>


-- 
Jiayu (James) Ji,

Cell: (312)823-7393

Re: How does mapreduce job determine the compress codec

Posted by Jiayu Ji <ji...@gmail.com>.

Thanks Azurry. That was exactly the thing I want to know.


On Sun, Dec 15, 2013 at 7:53 PM, Azuryy Yu <az...@gmail.com> wrote:

> Hi Jiayu,
> For the Sequence file as an input, CompressCodec class was serialized in
> the file header, then Sequence Filereader will know the compression algo.
> thanks.
>
>
>
>
> On Mon, Dec 16, 2013 at 8:28 AM, Jiayu Ji <ji...@gmail.com> wrote:
>
>> Thanks Tao. I know I can tell it is a lzo file based on the magic number.
>> What I am curious is which class in hadoop used by the mapreduce job to
>> determine the file compression algorithm. At the end of the day, I am
>> trying to figure out whether all the inputs of a mapreduce job have to be
>> compressed with the same algorithm.
>>
>>
>> On Fri, Dec 13, 2013 at 11:16 PM, Tao Xiao <xi...@gmail.com>wrote:
>>
>>> I suggest you download the lzo compressed file, no matter weather it has
>>> a lzo extension as its file name,  and open it in the form of hex bytes
>>> with tools like UltraEdit, and have a look at its heading contents.
>>>
>>>
>>> 2013/12/14 Jiayu Ji <ji...@gmail.com>
>>>
>>>> Hi
>>>>
>>>> I am having this question on how does mapreduce job determine the
>>>> compress codec on hdfs. From what I read on the definitive guide (page
>>>> 86)," the CompressionCodecFactory provides a way of mapping a filename
>>>> extension to a CompressionCodec using its getCodec() method". I did a test
>>>> with a lzo compressed file without a lzo extension. However, the mapreduce
>>>> job was still able to get the right codec. Does anyone know why? Thanks in
>>>> advance.
>>>>
>>>> Jiayu
>>>>
>>>
>>>
>>
>>
>> --
>> Jiayu (James) Ji,
>>
>> Cell: (312)823-7393
>>
>>
>


-- 
Jiayu (James) Ji,

Cell: (312)823-7393

Re: How does mapreduce job determine the compress codec

Posted by Azuryy Yu <az...@gmail.com>.

Hi Jiayu,
For the Sequence file as an input, CompressCodec class was serialized in
the file header, then Sequence Filereader will know the compression algo.
thanks.




On Mon, Dec 16, 2013 at 8:28 AM, Jiayu Ji <ji...@gmail.com> wrote:

> Thanks Tao. I know I can tell it is a lzo file based on the magic number.
> What I am curious is which class in hadoop used by the mapreduce job to
> determine the file compression algorithm. At the end of the day, I am
> trying to figure out whether all the inputs of a mapreduce job have to be
> compressed with the same algorithm.
>
>
> On Fri, Dec 13, 2013 at 11:16 PM, Tao Xiao <xi...@gmail.com>wrote:
>
>> I suggest you download the lzo compressed file, no matter weather it has
>> a lzo extension as its file name,  and open it in the form of hex bytes
>> with tools like UltraEdit, and have a look at its heading contents.
>>
>>
>> 2013/12/14 Jiayu Ji <ji...@gmail.com>
>>
>>> Hi
>>>
>>> I am having this question on how does mapreduce job determine the
>>> compress codec on hdfs. From what I read on the definitive guide (page
>>> 86)," the CompressionCodecFactory provides a way of mapping a filename
>>> extension to a CompressionCodec using its getCodec() method". I did a test
>>> with a lzo compressed file without a lzo extension. However, the mapreduce
>>> job was still able to get the right codec. Does anyone know why? Thanks in
>>> advance.
>>>
>>> Jiayu
>>>
>>
>>
>
>
> --
> Jiayu (James) Ji,
>
> Cell: (312)823-7393
>
>

Re: How does mapreduce job determine the compress codec

Posted by Azuryy Yu <az...@gmail.com>.

Hi Jiayu,
For the Sequence file as an input, CompressCodec class was serialized in
the file header, then Sequence Filereader will know the compression algo.
thanks.




On Mon, Dec 16, 2013 at 8:28 AM, Jiayu Ji <ji...@gmail.com> wrote:

> Thanks Tao. I know I can tell it is a lzo file based on the magic number.
> What I am curious is which class in hadoop used by the mapreduce job to
> determine the file compression algorithm. At the end of the day, I am
> trying to figure out whether all the inputs of a mapreduce job have to be
> compressed with the same algorithm.
>
>
> On Fri, Dec 13, 2013 at 11:16 PM, Tao Xiao <xi...@gmail.com>wrote:
>
>> I suggest you download the lzo compressed file, no matter weather it has
>> a lzo extension as its file name,  and open it in the form of hex bytes
>> with tools like UltraEdit, and have a look at its heading contents.
>>
>>
>> 2013/12/14 Jiayu Ji <ji...@gmail.com>
>>
>>> Hi
>>>
>>> I am having this question on how does mapreduce job determine the
>>> compress codec on hdfs. From what I read on the definitive guide (page
>>> 86)," the CompressionCodecFactory provides a way of mapping a filename
>>> extension to a CompressionCodec using its getCodec() method". I did a test
>>> with a lzo compressed file without a lzo extension. However, the mapreduce
>>> job was still able to get the right codec. Does anyone know why? Thanks in
>>> advance.
>>>
>>> Jiayu
>>>
>>
>>
>
>
> --
> Jiayu (James) Ji,
>
> Cell: (312)823-7393
>
>

Re: How does mapreduce job determine the compress codec

Posted by Azuryy Yu <az...@gmail.com>.

Hi Jiayu,
For the Sequence file as an input, CompressCodec class was serialized in
the file header, then Sequence Filereader will know the compression algo.
thanks.




On Mon, Dec 16, 2013 at 8:28 AM, Jiayu Ji <ji...@gmail.com> wrote:

> Thanks Tao. I know I can tell it is a lzo file based on the magic number.
> What I am curious is which class in hadoop used by the mapreduce job to
> determine the file compression algorithm. At the end of the day, I am
> trying to figure out whether all the inputs of a mapreduce job have to be
> compressed with the same algorithm.
>
>
> On Fri, Dec 13, 2013 at 11:16 PM, Tao Xiao <xi...@gmail.com>wrote:
>
>> I suggest you download the lzo compressed file, no matter weather it has
>> a lzo extension as its file name,  and open it in the form of hex bytes
>> with tools like UltraEdit, and have a look at its heading contents.
>>
>>
>> 2013/12/14 Jiayu Ji <ji...@gmail.com>
>>
>>> Hi
>>>
>>> I am having this question on how does mapreduce job determine the
>>> compress codec on hdfs. From what I read on the definitive guide (page
>>> 86)," the CompressionCodecFactory provides a way of mapping a filename
>>> extension to a CompressionCodec using its getCodec() method". I did a test
>>> with a lzo compressed file without a lzo extension. However, the mapreduce
>>> job was still able to get the right codec. Does anyone know why? Thanks in
>>> advance.
>>>
>>> Jiayu
>>>
>>
>>
>
>
> --
> Jiayu (James) Ji,
>
> Cell: (312)823-7393
>
>

Re: How does mapreduce job determine the compress codec

Posted by Azuryy Yu <az...@gmail.com>.

Hi Jiayu,
For the Sequence file as an input, CompressCodec class was serialized in
the file header, then Sequence Filereader will know the compression algo.
thanks.




On Mon, Dec 16, 2013 at 8:28 AM, Jiayu Ji <ji...@gmail.com> wrote:

> Thanks Tao. I know I can tell it is a lzo file based on the magic number.
> What I am curious is which class in hadoop used by the mapreduce job to
> determine the file compression algorithm. At the end of the day, I am
> trying to figure out whether all the inputs of a mapreduce job have to be
> compressed with the same algorithm.
>
>
> On Fri, Dec 13, 2013 at 11:16 PM, Tao Xiao <xi...@gmail.com>wrote:
>
>> I suggest you download the lzo compressed file, no matter weather it has
>> a lzo extension as its file name,  and open it in the form of hex bytes
>> with tools like UltraEdit, and have a look at its heading contents.
>>
>>
>> 2013/12/14 Jiayu Ji <ji...@gmail.com>
>>
>>> Hi
>>>
>>> I am having this question on how does mapreduce job determine the
>>> compress codec on hdfs. From what I read on the definitive guide (page
>>> 86)," the CompressionCodecFactory provides a way of mapping a filename
>>> extension to a CompressionCodec using its getCodec() method". I did a test
>>> with a lzo compressed file without a lzo extension. However, the mapreduce
>>> job was still able to get the right codec. Does anyone know why? Thanks in
>>> advance.
>>>
>>> Jiayu
>>>
>>
>>
>
>
> --
> Jiayu (James) Ji,
>
> Cell: (312)823-7393
>
>

Re: How does mapreduce job determine the compress codec

Posted by Jiayu Ji <ji...@gmail.com>.

Thanks Tao. I know I can tell it is a lzo file based on the magic number.
What I am curious is which class in hadoop used by the mapreduce job to
determine the file compression algorithm. At the end of the day, I am
trying to figure out whether all the inputs of a mapreduce job have to be
compressed with the same algorithm.

On Fri, Dec 13, 2013 at 11:16 PM, Tao Xiao <xi...@gmail.com> wrote:

> I suggest you download the lzo compressed file, no matter weather it has a
> lzo extension as its file name,  and open it in the form of hex bytes with
> tools like UltraEdit, and have a look at its heading contents.
>
>
> 2013/12/14 Jiayu Ji <ji...@gmail.com>
>
>> Hi
>>
>> I am having this question on how does mapreduce job determine the
>> compress codec on hdfs. From what I read on the definitive guide (page
>> 86)," the CompressionCodecFactory provides a way of mapping a filename
>> extension to a CompressionCodec using its getCodec() method". I did a test
>> with a lzo compressed file without a lzo extension. However, the mapreduce
>> job was still able to get the right codec. Does anyone know why? Thanks in
>> advance.
>>
>> Jiayu
>>
>
>

-- 
Jiayu (James) Ji,

Cell: (312)823-7393

Re: How does mapreduce job determine the compress codec

Posted by Jiayu Ji <ji...@gmail.com>.

Thanks Tao. I know I can tell it is a lzo file based on the magic number.
What I am curious is which class in hadoop used by the mapreduce job to
determine the file compression algorithm. At the end of the day, I am
trying to figure out whether all the inputs of a mapreduce job have to be
compressed with the same algorithm.

On Fri, Dec 13, 2013 at 11:16 PM, Tao Xiao <xi...@gmail.com> wrote:

> I suggest you download the lzo compressed file, no matter weather it has a
> lzo extension as its file name,  and open it in the form of hex bytes with
> tools like UltraEdit, and have a look at its heading contents.
>
>
> 2013/12/14 Jiayu Ji <ji...@gmail.com>
>
>> Hi
>>
>> I am having this question on how does mapreduce job determine the
>> compress codec on hdfs. From what I read on the definitive guide (page
>> 86)," the CompressionCodecFactory provides a way of mapping a filename
>> extension to a CompressionCodec using its getCodec() method". I did a test
>> with a lzo compressed file without a lzo extension. However, the mapreduce
>> job was still able to get the right codec. Does anyone know why? Thanks in
>> advance.
>>
>> Jiayu
>>
>
>

-- 
Jiayu (James) Ji,

Cell: (312)823-7393

Re: How does mapreduce job determine the compress codec

Posted by Jiayu Ji <ji...@gmail.com>.

Thanks Tao. I know I can tell it is a lzo file based on the magic number.
What I am curious is which class in hadoop used by the mapreduce job to
determine the file compression algorithm. At the end of the day, I am
trying to figure out whether all the inputs of a mapreduce job have to be
compressed with the same algorithm.

On Fri, Dec 13, 2013 at 11:16 PM, Tao Xiao <xi...@gmail.com> wrote:

> I suggest you download the lzo compressed file, no matter weather it has a
> lzo extension as its file name,  and open it in the form of hex bytes with
> tools like UltraEdit, and have a look at its heading contents.
>
>
> 2013/12/14 Jiayu Ji <ji...@gmail.com>
>
>> Hi
>>
>> I am having this question on how does mapreduce job determine the
>> compress codec on hdfs. From what I read on the definitive guide (page
>> 86)," the CompressionCodecFactory provides a way of mapping a filename
>> extension to a CompressionCodec using its getCodec() method". I did a test
>> with a lzo compressed file without a lzo extension. However, the mapreduce
>> job was still able to get the right codec. Does anyone know why? Thanks in
>> advance.
>>
>> Jiayu
>>
>
>

-- 
Jiayu (James) Ji,

Cell: (312)823-7393

Re: How does mapreduce job determine the compress codec

Posted by Jiayu Ji <ji...@gmail.com>.

Thanks Tao. I know I can tell it is a lzo file based on the magic number.
What I am curious is which class in hadoop used by the mapreduce job to
determine the file compression algorithm. At the end of the day, I am
trying to figure out whether all the inputs of a mapreduce job have to be
compressed with the same algorithm.

On Fri, Dec 13, 2013 at 11:16 PM, Tao Xiao <xi...@gmail.com> wrote:

> I suggest you download the lzo compressed file, no matter weather it has a
> lzo extension as its file name,  and open it in the form of hex bytes with
> tools like UltraEdit, and have a look at its heading contents.
>
>
> 2013/12/14 Jiayu Ji <ji...@gmail.com>
>
>> Hi
>>
>> I am having this question on how does mapreduce job determine the
>> compress codec on hdfs. From what I read on the definitive guide (page
>> 86)," the CompressionCodecFactory provides a way of mapping a filename
>> extension to a CompressionCodec using its getCodec() method". I did a test
>> with a lzo compressed file without a lzo extension. However, the mapreduce
>> job was still able to get the right codec. Does anyone know why? Thanks in
>> advance.
>>
>> Jiayu
>>
>
>

-- 
Jiayu (James) Ji,

Cell: (312)823-7393

Re: How does mapreduce job determine the compress codec

Posted by Tao Xiao <xi...@gmail.com>.

I suggest you download the lzo compressed file, no matter weather it has a
lzo extension as its file name,  and open it in the form of hex bytes with
tools like UltraEdit, and have a look at its heading contents.


2013/12/14 Jiayu Ji <ji...@gmail.com>

> Hi
>
> I am having this question on how does mapreduce job determine the compress
> codec on hdfs. From what I read on the definitive guide (page 86)," the
> CompressionCodecFactory provides a way of mapping a filename extension to a
> CompressionCodec using its getCodec() method". I did a test with a lzo
> compressed file without a lzo extension. However, the mapreduce job was
> still able to get the right codec. Does anyone know why? Thanks in advance.
>
> Jiayu
>

Re: How does mapreduce job determine the compress codec

Posted by Tao Xiao <xi...@gmail.com>.

I suggest you download the lzo compressed file, no matter weather it has a
lzo extension as its file name,  and open it in the form of hex bytes with
tools like UltraEdit, and have a look at its heading contents.


2013/12/14 Jiayu Ji <ji...@gmail.com>

> Hi
>
> I am having this question on how does mapreduce job determine the compress
> codec on hdfs. From what I read on the definitive guide (page 86)," the
> CompressionCodecFactory provides a way of mapping a filename extension to a
> CompressionCodec using its getCodec() method". I did a test with a lzo
> compressed file without a lzo extension. However, the mapreduce job was
> still able to get the right codec. Does anyone know why? Thanks in advance.
>
> Jiayu
>

Re: How does mapreduce job determine the compress codec

Posted by Tao Xiao <xi...@gmail.com>.

I suggest you download the lzo compressed file, no matter weather it has a
lzo extension as its file name,  and open it in the form of hex bytes with
tools like UltraEdit, and have a look at its heading contents.


2013/12/14 Jiayu Ji <ji...@gmail.com>

> Hi
>
> I am having this question on how does mapreduce job determine the compress
> codec on hdfs. From what I read on the definitive guide (page 86)," the
> CompressionCodecFactory provides a way of mapping a filename extension to a
> CompressionCodec using its getCodec() method". I did a test with a lzo
> compressed file without a lzo extension. However, the mapreduce job was
> still able to get the right codec. Does anyone know why? Thanks in advance.
>
> Jiayu
>

Re: How does mapreduce job determine the compress codec

Posted by Tao Xiao <xi...@gmail.com>.

I suggest you download the lzo compressed file, no matter weather it has a
lzo extension as its file name,  and open it in the form of hex bytes with
tools like UltraEdit, and have a look at its heading contents.


2013/12/14 Jiayu Ji <ji...@gmail.com>

> Hi
>
> I am having this question on how does mapreduce job determine the compress
> codec on hdfs. From what I read on the definitive guide (page 86)," the
> CompressionCodecFactory provides a way of mapping a filename extension to a
> CompressionCodec using its getCodec() method". I did a test with a lzo
> compressed file without a lzo extension. However, the mapreduce job was
> still able to get the right codec. Does anyone know why? Thanks in advance.
>
> Jiayu
>