You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Bryan Pendleton (JIRA)" <ji...@apache.org> on 2007/01/08 20:12:27 UTC

[jira] Created: (HADOOP-865) Files written to S3 but never closed can't be deleted

Files written to S3 but never closed can't be deleted
-----------------------------------------------------

                 Key: HADOOP-865
                 URL: https://issues.apache.org/jira/browse/HADOOP-865
             Project: Hadoop
          Issue Type: Bug
          Components: fs
            Reporter: Bryan Pendleton


I've been playing with the S3 integration. My first attempts to use it are actually as a drop-in replacement for a backup job, streaming data offsite by piping the backup job output to a "hadoop dfs -put - targetfile".

If enough errors occur posting to S3 (this happened easily last Thursday, during an S3 growth issue), the write can eventually fail. At that point, there are both blocks and a partial INode written into S3. Doing a "hadoop dfs -ls filename" shows the file, it has a non-zero size, etc. However, trying to "hadoop dfs -rm filename" a failed-written file results in the response "rm: No such file or directory."

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Resolved: (HADOOP-865) Files written to S3 but never closed can't be deleted

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting resolved HADOOP-865.
---------------------------------

       Resolution: Fixed
    Fix Version/s: 0.10.1
         Assignee: Tom White

I just committed this.  Thanks, Tom!

> Files written to S3 but never closed can't be deleted
> -----------------------------------------------------
>
>                 Key: HADOOP-865
>                 URL: https://issues.apache.org/jira/browse/HADOOP-865
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>            Reporter: Bryan Pendleton
>         Assigned To: Tom White
>             Fix For: 0.10.1
>
>         Attachments: hadoop-865.patch
>
>
> I've been playing with the S3 integration. My first attempts to use it are actually as a drop-in replacement for a backup job, streaming data offsite by piping the backup job output to a "hadoop dfs -put - targetfile".
> If enough errors occur posting to S3 (this happened easily last Thursday, during an S3 growth issue), the write can eventually fail. At that point, there are both blocks and a partial INode written into S3. Doing a "hadoop dfs -ls filename" shows the file, it has a non-zero size, etc. However, trying to "hadoop dfs -rm filename" a failed-written file results in the response "rm: No such file or directory."

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-865) Files written to S3 but never closed can't be deleted

Posted by "Tom White (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-865:
-----------------------------

    Attachment: hadoop-865.patch

> Files written to S3 but never closed can't be deleted
> -----------------------------------------------------
>
>                 Key: HADOOP-865
>                 URL: https://issues.apache.org/jira/browse/HADOOP-865
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>            Reporter: Bryan Pendleton
>         Attachments: hadoop-865.patch
>
>
> I've been playing with the S3 integration. My first attempts to use it are actually as a drop-in replacement for a backup job, streaming data offsite by piping the backup job output to a "hadoop dfs -put - targetfile".
> If enough errors occur posting to S3 (this happened easily last Thursday, during an S3 growth issue), the write can eventually fail. At that point, there are both blocks and a partial INode written into S3. Doing a "hadoop dfs -ls filename" shows the file, it has a non-zero size, etc. However, trying to "hadoop dfs -rm filename" a failed-written file results in the response "rm: No such file or directory."

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-865) Files written to S3 but never closed can't be deleted

Posted by "Tom White (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463128 ] 

Tom White commented on HADOOP-865:
----------------------------------

Do you know if the files (inodes or blocks) got corrupted, or if a block didn't get written? If you still have the files on S3 then it would be really helpful if you could send an S3 directory listing using a regular S3 tool (e.g. http://www.hanzoarchives.com/development-projects/s3-tools/).

Thanks.

BTW nice use of S3FileSystem as an infinite disk!

Tom



> Files written to S3 but never closed can't be deleted
> -----------------------------------------------------
>
>                 Key: HADOOP-865
>                 URL: https://issues.apache.org/jira/browse/HADOOP-865
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>            Reporter: Bryan Pendleton
>
> I've been playing with the S3 integration. My first attempts to use it are actually as a drop-in replacement for a backup job, streaming data offsite by piping the backup job output to a "hadoop dfs -put - targetfile".
> If enough errors occur posting to S3 (this happened easily last Thursday, during an S3 growth issue), the write can eventually fail. At that point, there are both blocks and a partial INode written into S3. Doing a "hadoop dfs -ls filename" shows the file, it has a non-zero size, etc. However, trying to "hadoop dfs -rm filename" a failed-written file results in the response "rm: No such file or directory."

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-865) Files written to S3 but never closed can't be deleted

Posted by "Tom White (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463140 ] 

Tom White commented on HADOOP-865:
----------------------------------

I think I've spotted the problem: the deleteRaw method throws an IOException if the inode doesn't exist - unlike the DFS or Local implementation. I'll produce a patch - thanks for the offer to test it.

Tom

> Files written to S3 but never closed can't be deleted
> -----------------------------------------------------
>
>                 Key: HADOOP-865
>                 URL: https://issues.apache.org/jira/browse/HADOOP-865
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>            Reporter: Bryan Pendleton
>
> I've been playing with the S3 integration. My first attempts to use it are actually as a drop-in replacement for a backup job, streaming data offsite by piping the backup job output to a "hadoop dfs -put - targetfile".
> If enough errors occur posting to S3 (this happened easily last Thursday, during an S3 growth issue), the write can eventually fail. At that point, there are both blocks and a partial INode written into S3. Doing a "hadoop dfs -ls filename" shows the file, it has a non-zero size, etc. However, trying to "hadoop dfs -rm filename" a failed-written file results in the response "rm: No such file or directory."

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-865) Files written to S3 but never closed can't be deleted

Posted by "Tom White (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463394 ] 

Tom White commented on HADOOP-865:
----------------------------------

Bryan,

The patch should be a simple fix for the problem. If you try "hadoop dfs -rm filename" it should now work.

Note that -rmr doesn't work yet (I will create another patch for this).

Thanks,

Tom

> Files written to S3 but never closed can't be deleted
> -----------------------------------------------------
>
>                 Key: HADOOP-865
>                 URL: https://issues.apache.org/jira/browse/HADOOP-865
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>            Reporter: Bryan Pendleton
>         Attachments: hadoop-865.patch
>
>
> I've been playing with the S3 integration. My first attempts to use it are actually as a drop-in replacement for a backup job, streaming data offsite by piping the backup job output to a "hadoop dfs -put - targetfile".
> If enough errors occur posting to S3 (this happened easily last Thursday, during an S3 growth issue), the write can eventually fail. At that point, there are both blocks and a partial INode written into S3. Doing a "hadoop dfs -ls filename" shows the file, it has a non-zero size, etc. However, trying to "hadoop dfs -rm filename" a failed-written file results in the response "rm: No such file or directory."

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-865) Files written to S3 but never closed can't be deleted

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12463135 ] 

Bryan Pendleton commented on HADOOP-865:
----------------------------------------

Looks like it might actually be .crc related.... but, I thought this file hadn't even been closed at the time.

Not that an -ls /backups/fon1 reflects:
/backups/fon1/backup.010507-1739.cpio.bz2.gpg   <r 1>   1048576
Yet, there are some .crc files that have been left from previous -rm operations, so there're probably some other middling problems around.

%2F
%2Fbackups
%2Fbackups%2Ffon1
%2Fbackups%2Ffon1%2F.backup.010507-1736.cpio.bz2.gpg.crc
%2Fbackups%2Ffon1%2F.backup.010807-1303.cpio.bz2.gpg.crc
%2Fbackups%2Ffon1%2Fbackup.010507-1739.cpio.bz2.gpg
block_-3795133870143584439
block_-8360567787439934597
block_8856210385271099486

I'll keep this data around for a little while, in case you think there are any patches that you'd like me to test.


> Files written to S3 but never closed can't be deleted
> -----------------------------------------------------
>
>                 Key: HADOOP-865
>                 URL: https://issues.apache.org/jira/browse/HADOOP-865
>             Project: Hadoop
>          Issue Type: Bug
>          Components: fs
>            Reporter: Bryan Pendleton
>
> I've been playing with the S3 integration. My first attempts to use it are actually as a drop-in replacement for a backup job, streaming data offsite by piping the backup job output to a "hadoop dfs -put - targetfile".
> If enough errors occur posting to S3 (this happened easily last Thursday, during an S3 growth issue), the write can eventually fail. At that point, there are both blocks and a partial INode written into S3. Doing a "hadoop dfs -ls filename" shows the file, it has a non-zero size, etc. However, trying to "hadoop dfs -rm filename" a failed-written file results in the response "rm: No such file or directory."

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira