You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Dmitry Sivachenko <tr...@gmail.com> on 2016/05/20 22:30:01 UTC

distcp fails with "source and target differ in block-size"

Hello,

When I copy files with distcp and -D dfs.blocksize=XXX (hadoop-2.7.2), it fails with 
"Source and target differ in block-size" error despite MAPREDUCE-5065 was committed 3 years ago.

Is it possible to merge this change to 2.7 / 2.8 branches?

Thanks.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org


Re: distcp fails with "source and target differ in block-size"

Posted by Chris Nauroth <cn...@hortonworks.com>.
There is also some discussion on that JIRA considering a checksum strategy
independent of block size.  I don't think anything was ever implemented
though, and there would be some drawbacks to that approach.  Sorry if this
caused confusion.

--Chris Nauroth




On 5/24/16, 9:55 AM, "Dmitry Sivachenko" <tr...@gmail.com> wrote:

>
>> On 24 May 2016, at 19:53, Chris Nauroth <cn...@hortonworks.com>
>>wrote:
>> 
>> Hello Dmitry,
>> 
>> To clarify, the intent of MAPREDUCE-5065 was to message the user that
>> using different block sizes on source and destination might cause a
>> failure to checksum mismatch.  The message to the user recommends either
>> the -pb (preserve block size) or -skipCrc (skip checksum validation) as
>> potential workarounds.  The intent of that patch was not to silently
>> proceed and report success when the block sizes are different, although
>> there was some discussion of that on the issue as a proposed solution.
>> 
>> To the best of my knowledge, this behavior hasn't really changed.  Only
>> the messaging to the user has changed to advise on some potential
>> workarounds.
>
>
>Okay, sorry for misunderstanding, I thought the intention was to make
>checksum blocksize-independent (which would be very intuitive).
>---------------------------------------------------------------------
>To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>For additional commands, e-mail: user-help@hadoop.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org


Re: distcp fails with "source and target differ in block-size"

Posted by Dmitry Sivachenko <tr...@gmail.com>.
> On 24 May 2016, at 19:53, Chris Nauroth <cn...@hortonworks.com> wrote:
> 
> Hello Dmitry,
> 
> To clarify, the intent of MAPREDUCE-5065 was to message the user that
> using different block sizes on source and destination might cause a
> failure to checksum mismatch.  The message to the user recommends either
> the -pb (preserve block size) or -skipCrc (skip checksum validation) as
> potential workarounds.  The intent of that patch was not to silently
> proceed and report success when the block sizes are different, although
> there was some discussion of that on the issue as a proposed solution.
> 
> To the best of my knowledge, this behavior hasn't really changed.  Only
> the messaging to the user has changed to advise on some potential
> workarounds.


Okay, sorry for misunderstanding, I thought the intention was to make checksum blocksize-independent (which would be very intuitive).
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org


Re: distcp fails with "source and target differ in block-size"

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Dmitry,

To clarify, the intent of MAPREDUCE-5065 was to message the user that
using different block sizes on source and destination might cause a
failure to checksum mismatch.  The message to the user recommends either
the -pb (preserve block size) or -skipCrc (skip checksum validation) as
potential workarounds.  The intent of that patch was not to silently
proceed and report success when the block sizes are different, although
there was some discussion of that on the issue as a proposed solution.

To the best of my knowledge, this behavior hasn't really changed.  Only
the messaging to the user has changed to advise on some potential
workarounds.

--Chris Nauroth




On 5/22/16, 10:31 AM, "Dmitry Sivachenko" <tr...@gmail.com> wrote:

>
>> On 21 May 2016, at 09:34, Dmitry Sivachenko <tr...@gmail.com> wrote:
>> 
>> 
>>> On 21 May 2016, at 02:15, Chris Nauroth <cn...@hortonworks.com>
>>>wrote:
>>> 
>>> Hello Dmitry,
>>> 
>>> MAPREDUCE-5065 has been included in these branches for a long time.
>>>Are
>>> you certain that you passed a dfs.blocksize equal to what was used in
>>>the
>>> source files?  Did all source files use the same block size?
>>> 
>> 
>> 
>> No, I am sure that I use -D dfs.blocksize=DifferentThanSourceBlockSize
>>(I want to change it during the copy).
>> 
>> I am not sure that all source files use the same block size (there are
>>thousands of them), but it is probably wrong to report error when I use
>>distcp to change block size?  SInce it is well-documented way for
>>changing block size.
>> 
>> Sorry if I am missing something.
>> 
>
>
>So to be clear: right now with Hadoop-2.7.2 I always get "checksum
>mismatch" error when I try to distcp a file with
>-Ddfs.blocksize=DifferentBlockSize
>
>And it looks like undesired behaviour, at least some stackoverflow
>articles suggest distcp as a way to change blocksize of existing file:
>
>http://stackoverflow.com/questions/29604823/change-block-size-of-existing-
>files-in-hadoop
>
>So probably some time ago this did not lead to error.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org


Re: distcp fails with "source and target differ in block-size"

Posted by Dmitry Sivachenko <tr...@gmail.com>.
> On 21 May 2016, at 09:34, Dmitry Sivachenko <tr...@gmail.com> wrote:
> 
> 
>> On 21 May 2016, at 02:15, Chris Nauroth <cn...@hortonworks.com> wrote:
>> 
>> Hello Dmitry,
>> 
>> MAPREDUCE-5065 has been included in these branches for a long time.  Are
>> you certain that you passed a dfs.blocksize equal to what was used in the
>> source files?  Did all source files use the same block size?
>> 
> 
> 
> No, I am sure that I use -D dfs.blocksize=DifferentThanSourceBlockSize (I want to change it during the copy).
> 
> I am not sure that all source files use the same block size (there are thousands of them), but it is probably wrong to report error when I use distcp to change block size?  SInce it is well-documented way for changing block size.
> 
> Sorry if I am missing something.
> 


So to be clear: right now with Hadoop-2.7.2 I always get "checksum mismatch" error when I try to distcp a file with
-Ddfs.blocksize=DifferentBlockSize

And it looks like undesired behaviour, at least some stackoverflow articles suggest distcp as a way to change blocksize of existing file:

http://stackoverflow.com/questions/29604823/change-block-size-of-existing-files-in-hadoop

So probably some time ago this did not lead to error.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org


Re: distcp fails with "source and target differ in block-size"

Posted by Dmitry Sivachenko <tr...@gmail.com>.
> On 21 May 2016, at 02:15, Chris Nauroth <cn...@hortonworks.com> wrote:
> 
> Hello Dmitry,
> 
> MAPREDUCE-5065 has been included in these branches for a long time.  Are
> you certain that you passed a dfs.blocksize equal to what was used in the
> source files?  Did all source files use the same block size?
> 


No, I am sure that I use -D dfs.blocksize=DifferentThanSourceBlockSize (I want to change it during the copy).

I am not sure that all source files use the same block size (there are thousands of them), but it is probably wrong to report error when I use distcp to change block size?  SInce it is well-documented way for changing block size.

Sorry if I am missing something.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org


Re: distcp fails with "source and target differ in block-size"

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Dmitry,

MAPREDUCE-5065 has been included in these branches for a long time.  Are
you certain that you passed a dfs.blocksize equal to what was used in the
source files?  Did all source files use the same block size?

--Chris Nauroth




On 5/20/16, 3:30 PM, "Dmitry Sivachenko" <tr...@gmail.com> wrote:

>Hello,
>
>When I copy files with distcp and -D dfs.blocksize=XXX (hadoop-2.7.2), it
>fails with 
>"Source and target differ in block-size" error despite MAPREDUCE-5065 was
>committed 3 years ago.
>
>Is it possible to merge this change to 2.7 / 2.8 branches?
>
>Thanks.
>---------------------------------------------------------------------
>To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>For additional commands, e-mail: user-help@hadoop.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org