You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Robert Chansler (JIRA)" <ji...@apache.org> on 2008/03/19 23:36:24 UTC
[jira] Created: (HADOOP-3052) distch -- tool to do parallel ch*
distch -- tool to do parallel ch*
----------------------------------
Key: HADOOP-3052
URL: https://issues.apache.org/jira/browse/HADOOP-3052
Project: Hadoop Core
Issue Type: Task
Components: dfs
Affects Versions: 0.16.1
Reporter: Robert Chansler
Assignee: Tsz Wo (Nicholas), SZE
Fix For: 0.16.2
Build a tool to do parallel ch{mod,grp,own} on files.
This would have the advantage over the shell -R commands in that name nodes syncs from multiple clients are effectively batched.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-3052) distch -- tool to do parallel ch*
Posted by "Rob Weltman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646610#action_12646610 ]
Rob Weltman commented on HADOOP-3052:
-------------------------------------
Alternate/better approach in HADOOP-3194
> distch -- tool to do parallel ch*
> ----------------------------------
>
> Key: HADOOP-3052
> URL: https://issues.apache.org/jira/browse/HADOOP-3052
> Project: Hadoop Core
> Issue Type: Task
> Components: dfs
> Affects Versions: 0.16.1
> Reporter: Robert Chansler
> Assignee: Tsz Wo (Nicholas), SZE
> Attachments: 3052_20080411.patch
>
>
> Build a tool to do parallel ch{mod,grp,own} on files.
> This would have the advantage over the shell -R commands in that name nodes syncs from multiple clients are effectively batched.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-3052) distch -- tool to do parallel ch*
Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sameer Paranjpye updated HADOOP-3052:
-------------------------------------
There doesn't appear to be much point in releasing and supporting this artifact. A patch can be made available for those who need it one off.
> distch -- tool to do parallel ch*
> ----------------------------------
>
> Key: HADOOP-3052
> URL: https://issues.apache.org/jira/browse/HADOOP-3052
> Project: Hadoop Core
> Issue Type: Task
> Components: dfs
> Affects Versions: 0.16.1
> Reporter: Robert Chansler
> Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.17.0
>
>
> Build a tool to do parallel ch{mod,grp,own} on files.
> This would have the advantage over the shell -R commands in that name nodes syncs from multiple clients are effectively batched.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-3052) distch -- tool to do parallel ch*
Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583134#action_12583134 ]
Raghu Angadi commented on HADOOP-3052:
--------------------------------------
> is this really something that we need to optimize?
The main use case considered is when a big cluster is upgraded to 0.16.
I hope this is more of a one time utility rather than real supported tool like distcp.
> distch -- tool to do parallel ch*
> ----------------------------------
>
> Key: HADOOP-3052
> URL: https://issues.apache.org/jira/browse/HADOOP-3052
> Project: Hadoop Core
> Issue Type: Task
> Components: dfs
> Affects Versions: 0.16.1
> Reporter: Robert Chansler
> Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.17.0
>
>
> Build a tool to do parallel ch{mod,grp,own} on files.
> This would have the advantage over the shell -R commands in that name nodes syncs from multiple clients are effectively batched.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-3052) distch -- tool to do parallel ch*
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580649#action_12580649 ]
Doug Cutting commented on HADOOP-3052:
--------------------------------------
Wouldn't this be a DDOS attach on the namenode?
> distch -- tool to do parallel ch*
> ----------------------------------
>
> Key: HADOOP-3052
> URL: https://issues.apache.org/jira/browse/HADOOP-3052
> Project: Hadoop Core
> Issue Type: Task
> Components: dfs
> Affects Versions: 0.16.1
> Reporter: Robert Chansler
> Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.16.2
>
>
> Build a tool to do parallel ch{mod,grp,own} on files.
> This would have the advantage over the shell -R commands in that name nodes syncs from multiple clients are effectively batched.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-3052) distch -- tool to do parallel ch*
Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583130#action_12583130 ]
Doug Cutting commented on HADOOP-3052:
--------------------------------------
My concern is two-part: (1) is this really something that we need to optimize? Is single-threaded 'chmod -R' so slow that applications are spending significant amount of their time in it? And, (2) is it perhaps a feature that someone who runs 'chmod -R' isn't able to overwhelm the namenode. The namenode is often shared between multiple mapreduce clusters (e.g. under HOD) but a single mapreduce cluster running a distributed 'chmod -R' could overwhelm the namenode and prevent other applications from making progress.
> distch -- tool to do parallel ch*
> ----------------------------------
>
> Key: HADOOP-3052
> URL: https://issues.apache.org/jira/browse/HADOOP-3052
> Project: Hadoop Core
> Issue Type: Task
> Components: dfs
> Affects Versions: 0.16.1
> Reporter: Robert Chansler
> Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.17.0
>
>
> Build a tool to do parallel ch{mod,grp,own} on files.
> This would have the advantage over the shell -R commands in that name nodes syncs from multiple clients are effectively batched.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-3052) distch -- tool to do parallel ch*
Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tsz Wo (Nicholas), SZE updated HADOOP-3052:
-------------------------------------------
Attachment: 3052_20080411.patch
3052_20080411.patch: my testing program. Someone may find it useful.
> distch -- tool to do parallel ch*
> ----------------------------------
>
> Key: HADOOP-3052
> URL: https://issues.apache.org/jira/browse/HADOOP-3052
> Project: Hadoop Core
> Issue Type: Task
> Components: dfs
> Affects Versions: 0.16.1
> Reporter: Robert Chansler
> Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.17.0
>
> Attachments: 3052_20080411.patch
>
>
> Build a tool to do parallel ch{mod,grp,own} on files.
> This would have the advantage over the shell -R commands in that name nodes syncs from multiple clients are effectively batched.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HADOOP-3052) distch -- tool to do parallel ch*
Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sameer Paranjpye resolved HADOOP-3052.
--------------------------------------
Resolution: Won't Fix
> distch -- tool to do parallel ch*
> ----------------------------------
>
> Key: HADOOP-3052
> URL: https://issues.apache.org/jira/browse/HADOOP-3052
> Project: Hadoop Core
> Issue Type: Task
> Components: dfs
> Affects Versions: 0.16.1
> Reporter: Robert Chansler
> Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.17.0
>
>
> Build a tool to do parallel ch{mod,grp,own} on files.
> This would have the advantage over the shell -R commands in that name nodes syncs from multiple clients are effectively batched.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-3052) distch -- tool to do parallel ch*
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen O'Malley updated HADOOP-3052:
----------------------------------
Fix Version/s: (was: 0.16.2)
0.17.0
This isn't a bug fix, so moving to 0.17.
> distch -- tool to do parallel ch*
> ----------------------------------
>
> Key: HADOOP-3052
> URL: https://issues.apache.org/jira/browse/HADOOP-3052
> Project: Hadoop Core
> Issue Type: Task
> Components: dfs
> Affects Versions: 0.16.1
> Reporter: Robert Chansler
> Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.17.0
>
>
> Build a tool to do parallel ch{mod,grp,own} on files.
> This would have the advantage over the shell -R commands in that name nodes syncs from multiple clients are effectively batched.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-3052) distch -- tool to do parallel ch*
Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583110#action_12583110 ]
Tsz Wo (Nicholas), SZE commented on HADOOP-3052:
------------------------------------------------
> Wouldn't this be a DDOS attach on the namenode?
I have written a distch map/reduce program for testing. You are right that it makes the NameNode very busy when the number of files/dirs are huge. The question is: how should we prevent DDOS? Any user could simply write a program to launch a huge number of accesses.
> distch -- tool to do parallel ch*
> ----------------------------------
>
> Key: HADOOP-3052
> URL: https://issues.apache.org/jira/browse/HADOOP-3052
> Project: Hadoop Core
> Issue Type: Task
> Components: dfs
> Affects Versions: 0.16.1
> Reporter: Robert Chansler
> Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.17.0
>
>
> Build a tool to do parallel ch{mod,grp,own} on files.
> This would have the advantage over the shell -R commands in that name nodes syncs from multiple clients are effectively batched.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-3052) distch -- tool to do parallel ch*
Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583116#action_12583116 ]
Raghu Angadi commented on HADOOP-3052:
--------------------------------------
> This would have the advantage over the shell -R commands in that name nodes syncs from multiple clients are effectively batched.
For this advantage, does the program need to be map/reduce? Client just needs to invoke multiple threads. As long as client threads are comparable to number of handlers in NameNode, it goes as fast as it could.
> distch -- tool to do parallel ch*
> ----------------------------------
>
> Key: HADOOP-3052
> URL: https://issues.apache.org/jira/browse/HADOOP-3052
> Project: Hadoop Core
> Issue Type: Task
> Components: dfs
> Affects Versions: 0.16.1
> Reporter: Robert Chansler
> Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.17.0
>
>
> Build a tool to do parallel ch{mod,grp,own} on files.
> This would have the advantage over the shell -R commands in that name nodes syncs from multiple clients are effectively batched.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-3052) distch -- tool to do parallel ch*
Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583150#action_12583150 ]
Allen Wittenauer commented on HADOOP-3052:
------------------------------------------
We estimated that it would take over a week with single threaded chown's to set permissions on one of our bigger clusters.
Using the test distch code, we're seeing timings like 1 hour 9 minutes, 33 seconds for 198382 files using 100 nodes.
> distch -- tool to do parallel ch*
> ----------------------------------
>
> Key: HADOOP-3052
> URL: https://issues.apache.org/jira/browse/HADOOP-3052
> Project: Hadoop Core
> Issue Type: Task
> Components: dfs
> Affects Versions: 0.16.1
> Reporter: Robert Chansler
> Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.17.0
>
>
> Build a tool to do parallel ch{mod,grp,own} on files.
> This would have the advantage over the shell -R commands in that name nodes syncs from multiple clients are effectively batched.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.