You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Arpit Agarwal (JIRA)" <ji...@apache.org> on 2015/08/27 22:34:47 UTC

[jira] [Comment Edited] (HADOOP-12358) FSShell should prompt before deleting directories bigger than a configured size

    [ https://issues.apache.org/jira/browse/HADOOP-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717481#comment-14717481 ] 

Arpit Agarwal edited comment on HADOOP-12358 at 8/27/15 8:34 PM:
-----------------------------------------------------------------

There are following concerns from the Jira.
# Compatibility. The checks are off by default.
# {{getContentSummary}} requires too many RPCs for filesystems other then DFS.
# Configuration complexity.
** We can get rid of the boolean setting, e.g. just disable the check if the thresholds are zero or negative. If we also get rid of the size-based threshold we only need one new setting for the inode count threshold.
# getContentSummary is expensive. This is a valid concern.


Does it make sense to move this check to the NN? NN already does a recursive permissions check for every delete call ({{FsPermissionChecker#checkSubAccess}}). A suggested approach:
# Add a {{FileSystem#delete}} overload that takes a threshold. 
# Extend the recursive permissions check to compute the number of descendant inodes. It is a little ugly but avoids recursing twice. We can skip the file size check.
# If the computed inode count is below the threshold the dir is deleted, else the call fails.
# If the call fails the shell command throws your prompt. If the user chooses Y, invoke the regular delete call.
# If the underlying filesystem does not support checking the threshold then it just performs a regular delete. This takes care of the first concern above.

This still has the potential to break automation when the feature is enabled so we can make the default behavior to simply fail the delete call. An additional parameter can allow prompting to override the checks. 


was (Author: arpitagarwal):
There are three concerns from the Jira.
# Compatibility. The checks are off by default.
# {{getContentSummary}} requires too many RPCs for filesystems other then DFS.
# Configuration complexity.
** We can get rid of the boolean setting, e.g. just disable the check if the thresholds are zero or negative. If we also get rid of the size-based threshold we only need one new setting for the inode count threshold.
# getContentSummary is expensive. This is a valid concern.


Does it make sense to move this check to the NN? NN already does a recursive permissions check for every delete call ({{FsPermissionChecker#checkSubAccess}}). A suggested approach:
# Add a {{FileSystem#delete}} overload that takes a threshold. 
# Extend the recursive permissions check to compute the number of descendant inodes. It is a little ugly but avoids recursing twice. We can skip the file size check.
# If the computed inode count is below the threshold the dir is deleted, else the call fails.
# If the call fails the shell command throws your prompt. If the user chooses Y, invoke the regular delete call.
# If the underlying filesystem does not support checking the threshold then it just performs a regular delete. This takes care of the first concern above.

This still has the potential to break automation when the feature is enabled so we can make the default behavior to simply fail the delete call. An additional parameter can allow prompting to override the checks. 

> FSShell should prompt before deleting directories bigger than a configured size
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-12358
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12358
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Xiaoyu Yao
>            Assignee: Xiaoyu Yao
>         Attachments: HADOOP-12358.00.patch, HADOOP-12358.01.patch, HADOOP-12358.02.patch, HADOOP-12358.03.patch
>
>
> We have seen many cases with customers deleting data inadvertently with -skipTrash. The FSShell should prompt user if the size of the data or the number of files being deleted is bigger than a threshold even though -skipTrash is being used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)