You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Christian Kunz (JIRA)" <ji...@apache.org> on 2008/04/22 01:35:21 UTC

[jira] Created: (HADOOP-3294) distcp leaves empty blocks afte successful execution

distcp leaves empty blocks afte successful execution
----------------------------------------------------

                 Key: HADOOP-3294
                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
             Project: Hadoop Core
          Issue Type: Bug
    Affects Versions: 0.16.3
         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
            Reporter: Christian Kunz


I copied around 40 TB between two hadoop clusters, with distcp running on source.

Job was *successful*, but one destination file was empty because of its only block being empty.
None of the distcp log files have any mentioning of this file.

There were a couple of messages in the namenode server log of the destination cluster referencing the file:

hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.

distcp should not rely on the user to double-check.
Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3294) distcp leaves empty blocks afte successful execution

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-3294:
-------------------------------------------

    Attachment: 3294_20080423b_0.16.patch

3294_20080423b_0.16.patch: for 0.16

> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Blocker
>             Fix For: 0.17.0
>
>         Attachments: 3294_20080423.patch, 3294_20080423b.patch, 3294_20080423b_0.16.patch
>
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3294) distcp leaves empty blocks afte successful execution

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592369#action_12592369 ] 

Hudson commented on HADOOP-3294:
--------------------------------

Integrated in Hadoop-trunk #470 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/470/])

> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Blocker
>             Fix For: 0.17.0
>
>         Attachments: 3294_20080423.patch, 3294_20080423b.patch
>
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3294) distcp leaves empty blocks afte successful execution

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591358#action_12591358 ] 

Doug Cutting commented on HADOOP-3294:
--------------------------------------

Verifying lengths is cheap and would catch many problems.  It could be done by the reducer, and the output could list any discrepancies.  Checking CRC's is more expensive and should be optional if implemented.

> Verifying file sizes could have some implication when we support "appends".

That's true, so we shouldn't have a discrepancy fail the job, but it should still be logged so that the user can see which files were modified after they were copied.

> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3294) distcp leaves empty blocks afte successful execution

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-3294:
-------------------------------------------

    Attachment: 3294_20080423.patch

3294_20080423.patch: check whether dst size == src size after copy and rename.

> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>         Attachments: 3294_20080423.patch
>
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3294) distcp leaves empty blocks afte successful execution

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592144#action_12592144 ] 

Chris Douglas commented on HADOOP-3294:
---------------------------------------

bq. Perhaps we should have a distcp 'sync' mode where it first checks if each source and target have the same length and/or date and skips copying when they do.

This is already in distcp as {{-update}}. Its semantics are a little odd- it assumes that the src tree matches the destination rather than the usual cp semantics- but it overwrites the destination file iff its size is different than the source file.

+1

> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: 3294_20080423.patch, 3294_20080423b.patch
>
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3294) distcp leaves empty blocks afte successful execution

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592222#action_12592222 ] 

Chris Douglas commented on HADOOP-3294:
---------------------------------------

bq. As a matter of fact, speculative execution was enabled before launching distcp

Running distcp with speculative execution turned on is definitely not supported. It disables it before starting the job, but if it's somehow turned on during the copy, then its behavior- particularly with \-update or \-override- is undefined.

> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Blocker
>             Fix For: 0.17.0
>
>         Attachments: 3294_20080423.patch, 3294_20080423b.patch
>
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3294) distcp leaves empty blocks afte successful execution

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592440#action_12592440 ] 

Chris Douglas commented on HADOOP-3294:
---------------------------------------

If it were specified as a final param, then it could be turned on during the job.

> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Blocker
>             Fix For: 0.17.0
>
>         Attachments: 3294_20080423.patch, 3294_20080423b.patch
>
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3294) distcp leaves empty blocks afte successful execution

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591877#action_12591877 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-3294:
------------------------------------------------

My guess is that the file was read successfully from the source cluster but failed to write to the destination cluster.  And then the file/lease was deleted silently due to HADOOP-2891.

distcp currently check whether the number of bytes processed in the mapper is equal to the source size.  I will add some codes to check whether the destination size is equal to the source size.

> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3294) distcp leaves empty blocks afte successful execution

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592130#action_12592130 ] 

Doug Cutting commented on HADOOP-3294:
--------------------------------------

+1 This looks fine to me.  If a copy fails, then its map fails and it will be retried.  If another process is updating files while distcp is running, then the distcp job may fail.  Someday we may want to permit jobs to proceed even when individual copies fail due to updates (perhaps useful for, e.g., backing up a live filesystem) but that would be a new feature.

> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: 3294_20080423.patch, 3294_20080423b.patch
>
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3294) distcp leaves empty blocks afte successful execution

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-3294:
----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this. Thanks, Nicholas

> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Blocker
>             Fix For: 0.17.0
>
>         Attachments: 3294_20080423.patch, 3294_20080423b.patch
>
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-3294) distcp leaves empty blocks afte successful execution

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592144#action_12592144 ] 

chris.douglas edited comment on HADOOP-3294 at 4/24/08 11:01 AM:
-----------------------------------------------------------------

bq. Perhaps we should have a distcp 'sync' mode where it first checks if each source and target have the same length and/or date and skips copying when they do.

This is already in distcp as \-update. Its semantics are a little odd- it assumes that the src tree matches the destination rather than the usual cp semantics- but it overwrites the destination file iff its size is different than the source file.

+1

      was (Author: chris.douglas):
    bq. Perhaps we should have a distcp 'sync' mode where it first checks if each source and target have the same length and/or date and skips copying when they do.

This is already in distcp as {{-update}}. Its semantics are a little odd- it assumes that the src tree matches the destination rather than the usual cp semantics- but it overwrites the destination file iff its size is different than the source file.

+1
  
> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: 3294_20080423.patch, 3294_20080423b.patch
>
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3294) distcp leaves empty blocks afte successful execution

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591340#action_12591340 ] 

dhruba borthakur commented on HADOOP-3294:
------------------------------------------

Verifying file sizes could have some implication when we support "appends". In that case, the source file could be appended to before the file-size check occurs. This is true for any kind of crc checking of the contents too.

> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3294) distcp leaves empty blocks afte successful execution

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591908#action_12591908 ] 

Hadoop QA commented on HADOOP-3294:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12380810/3294_20080423b.patch
against trunk revision 645773.

    @author +1.  The patch does not contain any @author tags.

    tests included -1.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new javac compiler warnings.

    release audit +1.  The applied patch does not generate any new release audit warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2315/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2315/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2315/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2315/console

This message is automatically generated.

> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: 3294_20080423.patch, 3294_20080423b.patch
>
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3294) distcp leaves empty blocks afte successful execution

Posted by "Christian Kunz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591363#action_12591363 ] 

Christian Kunz commented on HADOOP-3294:
----------------------------------------

> Verifying file sizes could have some implication when we support "appends".
Also, in this case the file time changes -- further, if the destination filesize is suspect (a multiple of blocksize inclueing 0) it would not hurt to copy such a file again (e.g. n the reducer)

> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3294) distcp leaves empty blocks afte successful execution

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-3294:
----------------------------------

         Priority: Blocker  (was: Major)
    Fix Version/s: 0.17.0
     Hadoop Flags: [Reviewed]

> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Blocker
>             Fix For: 0.17.0
>
>         Attachments: 3294_20080423.patch, 3294_20080423b.patch
>
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3294) distcp leaves empty blocks afte successful execution

Posted by "Christian Kunz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591167#action_12591167 ] 

Christian Kunz commented on HADOOP-3294:
----------------------------------------

As a matter of fact, speculative execution was enabled before launching distcp. On the other hand, the actual job configuration had
mapred.map.tasks.speculative.execution=false,
i.e. distcp must turn it off before submiting the job.

> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3294) distcp leaves empty blocks afte successful execution

Posted by "Mukund Madhugiri (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mukund Madhugiri updated HADOOP-3294:
-------------------------------------

    Fix Version/s:     (was: 0.17.0)
                   0.16.4

I committed this to 0.16.4. Thanks Nicholas.

> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Blocker
>             Fix For: 0.16.4
>
>         Attachments: 3294_20080423.patch, 3294_20080423b.patch, 3294_20080423b_0.16.patch
>
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3294) distcp leaves empty blocks afte successful execution

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-3294:
-------------------------------------------

    Attachment: 3294_20080423b.patch

3294_20080423b.patch: throw IOException instead of logging.

> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>         Attachments: 3294_20080423.patch, 3294_20080423b.patch
>
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3294) distcp leaves empty blocks afte successful execution

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591338#action_12591338 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-3294:
------------------------------------------------

I think it is a good idea to verify file sizes after copying.

We probably should provide some way to verify the file content.  Computing a message digest of the entire file seems not practical.  Instead, we could compute message digests over the crc checksums.  It will be much more efficient since the crc checksums are relatively small.

> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3294) distcp leaves empty blocks afte successful execution

Posted by "Christian Kunz (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Kunz updated HADOOP-3294:
-----------------------------------

    Component/s: util

> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3294) distcp leaves empty blocks afte successful execution

Posted by "Christian Kunz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592240#action_12592240 ] 

Christian Kunz commented on HADOOP-3294:
----------------------------------------

distcp turns speculative execution off  before launching the map-reduce job -- how would you imagine to get it turned on again during the copy?

> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Blocker
>             Fix For: 0.17.0
>
>         Attachments: 3294_20080423.patch, 3294_20080423b.patch
>
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3294) distcp leaves empty blocks afte successful execution

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591377#action_12591377 ] 

Doug Cutting commented on HADOOP-3294:
--------------------------------------

> it would not hurt to copy such a file again (e.g. n the reducer)

If a process is continually updating the source then such copying will never complete.

Perhaps we should have a distcp 'sync' mode where it first checks if each source and target have the same length and/or date and skips copying when they do.  Then one could choose to repeat distcp until it completes with no discrepancies.  Such looping could be built into the client.

> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3294) distcp leaves empty blocks afte successful execution

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591163#action_12591163 ] 

Koji Noguchi commented on HADOOP-3294:
--------------------------------------

It may be unrelated, but do you have a speculative execution turned on?


> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3294) distcp leaves empty blocks afte successful execution

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-3294:
-------------------------------------------

    Assignee: Tsz Wo (Nicholas), SZE
      Status: Patch Available  (was: Open)

> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: 3294_20080423.patch, 3294_20080423b.patch
>
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3294) distcp leaves empty blocks afte successful execution

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591418#action_12591418 ] 

dhruba borthakur commented on HADOOP-3294:
------------------------------------------

Maybe this is related to HADOOP-2891 when the dfsclient gets closed before the file is closed. In this case, the client used to call abandonFileInprogress. This code has been removed in 0.17 and later releases.

> distcp leaves empty blocks afte successful execution
> ----------------------------------------------------
>
>                 Key: HADOOP-3294
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3294
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.16.3
>         Environment: 0.16.3 without any patches. Dfs permissions turned off everywhere, such that HADOOP-3138 and HADOOP-3186 do not apply
>            Reporter: Christian Kunz
>
> I copied around 40 TB between two hadoop clusters, with distcp running on source.
> Job was *successful*, but one destination file was empty because of its only block being empty.
> None of the distcp log files have any mentioning of this file.
> There were a couple of messages in the namenode server log of the destination cluster referencing the file:
> hadoop-xxxnamenode-yyy.log.2008-04-19:2008-04-19 02:19:15,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: destinationDir/_distcp_tmp_z0g93p/fileName. blk_-9209890281741927376
> hadoop-xxx-namenode-yyy.log.2008-04-19:2008-04-19 02:54:45,820 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on destinationDir/_distcp_tmp_z0g93p/fileName file does not exist.
> distcp should not rely on the user to double-check.
> Would it make sense to add a reducer  to compare destination file sizes with source files sizes and do some appropriate action?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.