You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Tom Brown <to...@gmail.com> on 2014/01/28 19:53:05 UTC

HDFS copyToLocal and get crc option

I am archiving a large amount of data out of my HDFS file system to a
separate shared storage solution (There is not much HDFS space left in my
cluster, and upgrading it is not an option right now).

I understand that HDFS internally manages checksums and won't succeed if
the data doesn't match the CRC, so I'm not worried about corruption when
reading from HDFS.

However, I want to store the HDFS crc calculations alongside the data files
after exporting them. I thought the "hadoop dfs -copyToLocal -crc
<hdfs-source> <local-dest>" command would work, but it always gives me the
error "-crc option is not valid when source file system does not have crc
files"

Can someone explain what exactly that option does, and when (if ever) it
should be used?

Thanks in advance!

--Tom

Re: HDFS copyToLocal and get crc option

Posted by Tom Brown <to...@gmail.com>.
I am using default values for both. My version is 1.1.2, and the default
value for "dfs.block.size" (67108864) is evenly divisible by 512.

However, the default value online reference for my version (
http://hadoop.apache.org/docs/r1.1.2/hdfs-default.html) doesn't have any
checksum related settings.

Was the implementation of the checksum feature added recently?

--Tom


On Fri, Jan 31, 2014 at 10:14 AM, praveenesh kumar <pr...@gmail.com>wrote:

> Hi Tom,
>
> My hint is your BLOCKSIZE should be multiple of CRC. Check your property
> dfs.block.size - convert it into bytes, then divide it with the checksum
> value that is set, usually its dfs.bytes-per-checksum property that tells
> this value or you can get the checksum value from the error message you are
> getting.
>
> HDFS uses this checksum value to make sure the data doesn't get courrpted
> while transfer (due to loss of bytes etc).
>
> I hope setting your block size with the multiple of your CRC checksum
> should solve your problem
>
> Regards
> Prav
>
>
> On Fri, Jan 31, 2014 at 4:30 PM, Tom Brown <to...@gmail.com> wrote:
>
>> What is the right way to use the "-crc" option with hadoop dfs
>> -copyToLocal?
>>
>> Is this the wrong list?
>>
>> --Tom
>>
>>
>> On Tue, Jan 28, 2014 at 11:53 AM, Tom Brown <to...@gmail.com> wrote:
>>
>>> I am archiving a large amount of data out of my HDFS file system to a
>>> separate shared storage solution (There is not much HDFS space left in my
>>> cluster, and upgrading it is not an option right now).
>>>
>>> I understand that HDFS internally manages checksums and won't succeed if
>>> the data doesn't match the CRC, so I'm not worried about corruption when
>>> reading from HDFS.
>>>
>>> However, I want to store the HDFS crc calculations alongside the data
>>> files after exporting them. I thought the "hadoop dfs -copyToLocal -crc
>>> <hdfs-source> <local-dest>" command would work, but it always gives me the
>>> error "-crc option is not valid when source file system does not have crc
>>> files"
>>>
>>> Can someone explain what exactly that option does, and when (if ever) it
>>> should be used?
>>>
>>> Thanks in advance!
>>>
>>> --Tom
>>>
>>
>>
>

Re: HDFS copyToLocal and get crc option

Posted by Tom Brown <to...@gmail.com>.
I am using default values for both. My version is 1.1.2, and the default
value for "dfs.block.size" (67108864) is evenly divisible by 512.

However, the default value online reference for my version (
http://hadoop.apache.org/docs/r1.1.2/hdfs-default.html) doesn't have any
checksum related settings.

Was the implementation of the checksum feature added recently?

--Tom


On Fri, Jan 31, 2014 at 10:14 AM, praveenesh kumar <pr...@gmail.com>wrote:

> Hi Tom,
>
> My hint is your BLOCKSIZE should be multiple of CRC. Check your property
> dfs.block.size - convert it into bytes, then divide it with the checksum
> value that is set, usually its dfs.bytes-per-checksum property that tells
> this value or you can get the checksum value from the error message you are
> getting.
>
> HDFS uses this checksum value to make sure the data doesn't get courrpted
> while transfer (due to loss of bytes etc).
>
> I hope setting your block size with the multiple of your CRC checksum
> should solve your problem
>
> Regards
> Prav
>
>
> On Fri, Jan 31, 2014 at 4:30 PM, Tom Brown <to...@gmail.com> wrote:
>
>> What is the right way to use the "-crc" option with hadoop dfs
>> -copyToLocal?
>>
>> Is this the wrong list?
>>
>> --Tom
>>
>>
>> On Tue, Jan 28, 2014 at 11:53 AM, Tom Brown <to...@gmail.com> wrote:
>>
>>> I am archiving a large amount of data out of my HDFS file system to a
>>> separate shared storage solution (There is not much HDFS space left in my
>>> cluster, and upgrading it is not an option right now).
>>>
>>> I understand that HDFS internally manages checksums and won't succeed if
>>> the data doesn't match the CRC, so I'm not worried about corruption when
>>> reading from HDFS.
>>>
>>> However, I want to store the HDFS crc calculations alongside the data
>>> files after exporting them. I thought the "hadoop dfs -copyToLocal -crc
>>> <hdfs-source> <local-dest>" command would work, but it always gives me the
>>> error "-crc option is not valid when source file system does not have crc
>>> files"
>>>
>>> Can someone explain what exactly that option does, and when (if ever) it
>>> should be used?
>>>
>>> Thanks in advance!
>>>
>>> --Tom
>>>
>>
>>
>

Re: HDFS copyToLocal and get crc option

Posted by Tom Brown <to...@gmail.com>.
I am using default values for both. My version is 1.1.2, and the default
value for "dfs.block.size" (67108864) is evenly divisible by 512.

However, the default value online reference for my version (
http://hadoop.apache.org/docs/r1.1.2/hdfs-default.html) doesn't have any
checksum related settings.

Was the implementation of the checksum feature added recently?

--Tom


On Fri, Jan 31, 2014 at 10:14 AM, praveenesh kumar <pr...@gmail.com>wrote:

> Hi Tom,
>
> My hint is your BLOCKSIZE should be multiple of CRC. Check your property
> dfs.block.size - convert it into bytes, then divide it with the checksum
> value that is set, usually its dfs.bytes-per-checksum property that tells
> this value or you can get the checksum value from the error message you are
> getting.
>
> HDFS uses this checksum value to make sure the data doesn't get courrpted
> while transfer (due to loss of bytes etc).
>
> I hope setting your block size with the multiple of your CRC checksum
> should solve your problem
>
> Regards
> Prav
>
>
> On Fri, Jan 31, 2014 at 4:30 PM, Tom Brown <to...@gmail.com> wrote:
>
>> What is the right way to use the "-crc" option with hadoop dfs
>> -copyToLocal?
>>
>> Is this the wrong list?
>>
>> --Tom
>>
>>
>> On Tue, Jan 28, 2014 at 11:53 AM, Tom Brown <to...@gmail.com> wrote:
>>
>>> I am archiving a large amount of data out of my HDFS file system to a
>>> separate shared storage solution (There is not much HDFS space left in my
>>> cluster, and upgrading it is not an option right now).
>>>
>>> I understand that HDFS internally manages checksums and won't succeed if
>>> the data doesn't match the CRC, so I'm not worried about corruption when
>>> reading from HDFS.
>>>
>>> However, I want to store the HDFS crc calculations alongside the data
>>> files after exporting them. I thought the "hadoop dfs -copyToLocal -crc
>>> <hdfs-source> <local-dest>" command would work, but it always gives me the
>>> error "-crc option is not valid when source file system does not have crc
>>> files"
>>>
>>> Can someone explain what exactly that option does, and when (if ever) it
>>> should be used?
>>>
>>> Thanks in advance!
>>>
>>> --Tom
>>>
>>
>>
>

Re: HDFS copyToLocal and get crc option

Posted by Tom Brown <to...@gmail.com>.
I am using default values for both. My version is 1.1.2, and the default
value for "dfs.block.size" (67108864) is evenly divisible by 512.

However, the default value online reference for my version (
http://hadoop.apache.org/docs/r1.1.2/hdfs-default.html) doesn't have any
checksum related settings.

Was the implementation of the checksum feature added recently?

--Tom


On Fri, Jan 31, 2014 at 10:14 AM, praveenesh kumar <pr...@gmail.com>wrote:

> Hi Tom,
>
> My hint is your BLOCKSIZE should be multiple of CRC. Check your property
> dfs.block.size - convert it into bytes, then divide it with the checksum
> value that is set, usually its dfs.bytes-per-checksum property that tells
> this value or you can get the checksum value from the error message you are
> getting.
>
> HDFS uses this checksum value to make sure the data doesn't get courrpted
> while transfer (due to loss of bytes etc).
>
> I hope setting your block size with the multiple of your CRC checksum
> should solve your problem
>
> Regards
> Prav
>
>
> On Fri, Jan 31, 2014 at 4:30 PM, Tom Brown <to...@gmail.com> wrote:
>
>> What is the right way to use the "-crc" option with hadoop dfs
>> -copyToLocal?
>>
>> Is this the wrong list?
>>
>> --Tom
>>
>>
>> On Tue, Jan 28, 2014 at 11:53 AM, Tom Brown <to...@gmail.com> wrote:
>>
>>> I am archiving a large amount of data out of my HDFS file system to a
>>> separate shared storage solution (There is not much HDFS space left in my
>>> cluster, and upgrading it is not an option right now).
>>>
>>> I understand that HDFS internally manages checksums and won't succeed if
>>> the data doesn't match the CRC, so I'm not worried about corruption when
>>> reading from HDFS.
>>>
>>> However, I want to store the HDFS crc calculations alongside the data
>>> files after exporting them. I thought the "hadoop dfs -copyToLocal -crc
>>> <hdfs-source> <local-dest>" command would work, but it always gives me the
>>> error "-crc option is not valid when source file system does not have crc
>>> files"
>>>
>>> Can someone explain what exactly that option does, and when (if ever) it
>>> should be used?
>>>
>>> Thanks in advance!
>>>
>>> --Tom
>>>
>>
>>
>

Re: HDFS copyToLocal and get crc option

Posted by praveenesh kumar <pr...@gmail.com>.
Hi Tom,

My hint is your BLOCKSIZE should be multiple of CRC. Check your property
dfs.block.size - convert it into bytes, then divide it with the checksum
value that is set, usually its dfs.bytes-per-checksum property that tells
this value or you can get the checksum value from the error message you are
getting.

HDFS uses this checksum value to make sure the data doesn't get courrpted
while transfer (due to loss of bytes etc).

I hope setting your block size with the multiple of your CRC checksum
should solve your problem

Regards
Prav


On Fri, Jan 31, 2014 at 4:30 PM, Tom Brown <to...@gmail.com> wrote:

> What is the right way to use the "-crc" option with hadoop dfs
> -copyToLocal?
>
> Is this the wrong list?
>
> --Tom
>
>
> On Tue, Jan 28, 2014 at 11:53 AM, Tom Brown <to...@gmail.com> wrote:
>
>> I am archiving a large amount of data out of my HDFS file system to a
>> separate shared storage solution (There is not much HDFS space left in my
>> cluster, and upgrading it is not an option right now).
>>
>> I understand that HDFS internally manages checksums and won't succeed if
>> the data doesn't match the CRC, so I'm not worried about corruption when
>> reading from HDFS.
>>
>> However, I want to store the HDFS crc calculations alongside the data
>> files after exporting them. I thought the "hadoop dfs -copyToLocal -crc
>> <hdfs-source> <local-dest>" command would work, but it always gives me the
>> error "-crc option is not valid when source file system does not have crc
>> files"
>>
>> Can someone explain what exactly that option does, and when (if ever) it
>> should be used?
>>
>> Thanks in advance!
>>
>> --Tom
>>
>
>

Re: HDFS copyToLocal and get crc option

Posted by praveenesh kumar <pr...@gmail.com>.
Hi Tom,

My hint is your BLOCKSIZE should be multiple of CRC. Check your property
dfs.block.size - convert it into bytes, then divide it with the checksum
value that is set, usually its dfs.bytes-per-checksum property that tells
this value or you can get the checksum value from the error message you are
getting.

HDFS uses this checksum value to make sure the data doesn't get courrpted
while transfer (due to loss of bytes etc).

I hope setting your block size with the multiple of your CRC checksum
should solve your problem

Regards
Prav


On Fri, Jan 31, 2014 at 4:30 PM, Tom Brown <to...@gmail.com> wrote:

> What is the right way to use the "-crc" option with hadoop dfs
> -copyToLocal?
>
> Is this the wrong list?
>
> --Tom
>
>
> On Tue, Jan 28, 2014 at 11:53 AM, Tom Brown <to...@gmail.com> wrote:
>
>> I am archiving a large amount of data out of my HDFS file system to a
>> separate shared storage solution (There is not much HDFS space left in my
>> cluster, and upgrading it is not an option right now).
>>
>> I understand that HDFS internally manages checksums and won't succeed if
>> the data doesn't match the CRC, so I'm not worried about corruption when
>> reading from HDFS.
>>
>> However, I want to store the HDFS crc calculations alongside the data
>> files after exporting them. I thought the "hadoop dfs -copyToLocal -crc
>> <hdfs-source> <local-dest>" command would work, but it always gives me the
>> error "-crc option is not valid when source file system does not have crc
>> files"
>>
>> Can someone explain what exactly that option does, and when (if ever) it
>> should be used?
>>
>> Thanks in advance!
>>
>> --Tom
>>
>
>

Re: HDFS copyToLocal and get crc option

Posted by praveenesh kumar <pr...@gmail.com>.
Hi Tom,

My hint is your BLOCKSIZE should be multiple of CRC. Check your property
dfs.block.size - convert it into bytes, then divide it with the checksum
value that is set, usually its dfs.bytes-per-checksum property that tells
this value or you can get the checksum value from the error message you are
getting.

HDFS uses this checksum value to make sure the data doesn't get courrpted
while transfer (due to loss of bytes etc).

I hope setting your block size with the multiple of your CRC checksum
should solve your problem

Regards
Prav


On Fri, Jan 31, 2014 at 4:30 PM, Tom Brown <to...@gmail.com> wrote:

> What is the right way to use the "-crc" option with hadoop dfs
> -copyToLocal?
>
> Is this the wrong list?
>
> --Tom
>
>
> On Tue, Jan 28, 2014 at 11:53 AM, Tom Brown <to...@gmail.com> wrote:
>
>> I am archiving a large amount of data out of my HDFS file system to a
>> separate shared storage solution (There is not much HDFS space left in my
>> cluster, and upgrading it is not an option right now).
>>
>> I understand that HDFS internally manages checksums and won't succeed if
>> the data doesn't match the CRC, so I'm not worried about corruption when
>> reading from HDFS.
>>
>> However, I want to store the HDFS crc calculations alongside the data
>> files after exporting them. I thought the "hadoop dfs -copyToLocal -crc
>> <hdfs-source> <local-dest>" command would work, but it always gives me the
>> error "-crc option is not valid when source file system does not have crc
>> files"
>>
>> Can someone explain what exactly that option does, and when (if ever) it
>> should be used?
>>
>> Thanks in advance!
>>
>> --Tom
>>
>
>

Re: HDFS copyToLocal and get crc option

Posted by praveenesh kumar <pr...@gmail.com>.
Hi Tom,

My hint is your BLOCKSIZE should be multiple of CRC. Check your property
dfs.block.size - convert it into bytes, then divide it with the checksum
value that is set, usually its dfs.bytes-per-checksum property that tells
this value or you can get the checksum value from the error message you are
getting.

HDFS uses this checksum value to make sure the data doesn't get courrpted
while transfer (due to loss of bytes etc).

I hope setting your block size with the multiple of your CRC checksum
should solve your problem

Regards
Prav


On Fri, Jan 31, 2014 at 4:30 PM, Tom Brown <to...@gmail.com> wrote:

> What is the right way to use the "-crc" option with hadoop dfs
> -copyToLocal?
>
> Is this the wrong list?
>
> --Tom
>
>
> On Tue, Jan 28, 2014 at 11:53 AM, Tom Brown <to...@gmail.com> wrote:
>
>> I am archiving a large amount of data out of my HDFS file system to a
>> separate shared storage solution (There is not much HDFS space left in my
>> cluster, and upgrading it is not an option right now).
>>
>> I understand that HDFS internally manages checksums and won't succeed if
>> the data doesn't match the CRC, so I'm not worried about corruption when
>> reading from HDFS.
>>
>> However, I want to store the HDFS crc calculations alongside the data
>> files after exporting them. I thought the "hadoop dfs -copyToLocal -crc
>> <hdfs-source> <local-dest>" command would work, but it always gives me the
>> error "-crc option is not valid when source file system does not have crc
>> files"
>>
>> Can someone explain what exactly that option does, and when (if ever) it
>> should be used?
>>
>> Thanks in advance!
>>
>> --Tom
>>
>
>

Re: HDFS copyToLocal and get crc option

Posted by Tom Brown <to...@gmail.com>.
What is the right way to use the "-crc" option with hadoop dfs -copyToLocal?

Is this the wrong list?

--Tom


On Tue, Jan 28, 2014 at 11:53 AM, Tom Brown <to...@gmail.com> wrote:

> I am archiving a large amount of data out of my HDFS file system to a
> separate shared storage solution (There is not much HDFS space left in my
> cluster, and upgrading it is not an option right now).
>
> I understand that HDFS internally manages checksums and won't succeed if
> the data doesn't match the CRC, so I'm not worried about corruption when
> reading from HDFS.
>
> However, I want to store the HDFS crc calculations alongside the data
> files after exporting them. I thought the "hadoop dfs -copyToLocal -crc
> <hdfs-source> <local-dest>" command would work, but it always gives me the
> error "-crc option is not valid when source file system does not have crc
> files"
>
> Can someone explain what exactly that option does, and when (if ever) it
> should be used?
>
> Thanks in advance!
>
> --Tom
>

Re: HDFS copyToLocal and get crc option

Posted by Tom Brown <to...@gmail.com>.
What is the right way to use the "-crc" option with hadoop dfs -copyToLocal?

Is this the wrong list?

--Tom


On Tue, Jan 28, 2014 at 11:53 AM, Tom Brown <to...@gmail.com> wrote:

> I am archiving a large amount of data out of my HDFS file system to a
> separate shared storage solution (There is not much HDFS space left in my
> cluster, and upgrading it is not an option right now).
>
> I understand that HDFS internally manages checksums and won't succeed if
> the data doesn't match the CRC, so I'm not worried about corruption when
> reading from HDFS.
>
> However, I want to store the HDFS crc calculations alongside the data
> files after exporting them. I thought the "hadoop dfs -copyToLocal -crc
> <hdfs-source> <local-dest>" command would work, but it always gives me the
> error "-crc option is not valid when source file system does not have crc
> files"
>
> Can someone explain what exactly that option does, and when (if ever) it
> should be used?
>
> Thanks in advance!
>
> --Tom
>

Re: HDFS copyToLocal and get crc option

Posted by Tom Brown <to...@gmail.com>.
What is the right way to use the "-crc" option with hadoop dfs -copyToLocal?

Is this the wrong list?

--Tom


On Tue, Jan 28, 2014 at 11:53 AM, Tom Brown <to...@gmail.com> wrote:

> I am archiving a large amount of data out of my HDFS file system to a
> separate shared storage solution (There is not much HDFS space left in my
> cluster, and upgrading it is not an option right now).
>
> I understand that HDFS internally manages checksums and won't succeed if
> the data doesn't match the CRC, so I'm not worried about corruption when
> reading from HDFS.
>
> However, I want to store the HDFS crc calculations alongside the data
> files after exporting them. I thought the "hadoop dfs -copyToLocal -crc
> <hdfs-source> <local-dest>" command would work, but it always gives me the
> error "-crc option is not valid when source file system does not have crc
> files"
>
> Can someone explain what exactly that option does, and when (if ever) it
> should be used?
>
> Thanks in advance!
>
> --Tom
>

Re: HDFS copyToLocal and get crc option

Posted by Tom Brown <to...@gmail.com>.
What is the right way to use the "-crc" option with hadoop dfs -copyToLocal?

Is this the wrong list?

--Tom


On Tue, Jan 28, 2014 at 11:53 AM, Tom Brown <to...@gmail.com> wrote:

> I am archiving a large amount of data out of my HDFS file system to a
> separate shared storage solution (There is not much HDFS space left in my
> cluster, and upgrading it is not an option right now).
>
> I understand that HDFS internally manages checksums and won't succeed if
> the data doesn't match the CRC, so I'm not worried about corruption when
> reading from HDFS.
>
> However, I want to store the HDFS crc calculations alongside the data
> files after exporting them. I thought the "hadoop dfs -copyToLocal -crc
> <hdfs-source> <local-dest>" command would work, but it always gives me the
> error "-crc option is not valid when source file system does not have crc
> files"
>
> Can someone explain what exactly that option does, and when (if ever) it
> should be used?
>
> Thanks in advance!
>
> --Tom
>