You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Koji Noguchi (JIRA)" <ji...@apache.org> on 2009/03/27 01:19:50 UTC

[jira] Created: (HADOOP-5588) hadoop commands seem extremely slow in 0.20 branch

hadoop commands seem extremely slow in 0.20 branch
--------------------------------------------------

                 Key: HADOOP-5588
                 URL: https://issues.apache.org/jira/browse/HADOOP-5588
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs, fs
         Environment: 0.20-branch and trunk
            Reporter: Koji Noguchi
            Priority: Blocker


hadoop dfs -get/rm/mkdir/etc   mydir/fileA mydir/fileB mydir/fileC ...

seem to be very slow in 0.20 branch. 
 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5588) hadoop commands seem extremely slow in 0.20 branch

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693864#action_12693864 ] 

Hairong Kuang commented on HADOOP-5588:
---------------------------------------

Koji did some experiments with the patch. He is too busy to post the results. I am doing this for him.

Directory size with 10,000 files.
About 450 mappers. Each mapper calling dfs -get 10000 times.

Without the fix, namenode was showing 20-30 getblocklocations per sec and 30-40 threads blocked.
With the fix, 600 getblocklocations per sec and almost no blocked threads. 

> hadoop commands seem extremely slow in 0.20 branch
> --------------------------------------------------
>
>                 Key: HADOOP-5588
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5588
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs, fs
>    Affects Versions: 0.20.0
>         Environment: 0.20-branch and trunk
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.20.0, 0.21.0
>
>         Attachments: globStatus.patch, globStatus1.patch
>
>
> hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc   mydir/fileA mydir/fileB mydir/fileC ...
> seem to be very slow in 0.20 branch. 
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5588) hadoop commands seem extremely slow in 0.20 branch

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12690016#action_12690016 ] 

Hairong Kuang commented on HADOOP-5588:
---------------------------------------

I am not able to get to run all unit tests, but all fs/dfs related unit tests were passed.

> hadoop commands seem extremely slow in 0.20 branch
> --------------------------------------------------
>
>                 Key: HADOOP-5588
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5588
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs, fs
>    Affects Versions: 0.20.0
>         Environment: 0.20-branch and trunk
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: globStatus.patch, globStatus1.patch
>
>
> hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc   mydir/fileA mydir/fileB mydir/fileC ...
> seem to be very slow in 0.20 branch. 
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HADOOP-5588) hadoop commands seem extremely slow in 0.20 branch

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE resolved HADOOP-5588.
--------------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.21.0

I have committed this to 0.20 and above.  Thanks, Hairong!

> hadoop commands seem extremely slow in 0.20 branch
> --------------------------------------------------
>
>                 Key: HADOOP-5588
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5588
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs, fs
>    Affects Versions: 0.20.0
>         Environment: 0.20-branch and trunk
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.20.0, 0.21.0
>
>         Attachments: globStatus.patch, globStatus1.patch
>
>
> hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc   mydir/fileA mydir/fileB mydir/fileC ...
> seem to be very slow in 0.20 branch. 
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5588) hadoop commands seem extremely slow in 0.20 branch

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-5588:
-------------------------------------------

    Hadoop Flags: [Reviewed]

{noformat}
     [exec] -1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
     [exec]                         Please justify why no tests are needed for this patch.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec] 
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
     [exec] 
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
{noformat}

> hadoop commands seem extremely slow in 0.20 branch
> --------------------------------------------------
>
>                 Key: HADOOP-5588
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5588
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs, fs
>    Affects Versions: 0.20.0
>         Environment: 0.20-branch and trunk
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: globStatus.patch, globStatus1.patch
>
>
> hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc   mydir/fileA mydir/fileB mydir/fileC ...
> seem to be very slow in 0.20 branch. 
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5588) hadoop commands seem extremely slow in 0.20 branch

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695422#action_12695422 ] 

Hudson commented on HADOOP-5588:
--------------------------------

Integrated in Hadoop-trunk #796 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/796/])
    

> hadoop commands seem extremely slow in 0.20 branch
> --------------------------------------------------
>
>                 Key: HADOOP-5588
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5588
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs, fs
>    Affects Versions: 0.20.0
>         Environment: 0.20-branch and trunk
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.20.0, 0.21.0
>
>         Attachments: globStatus.patch, globStatus1.patch
>
>
> hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc   mydir/fileA mydir/fileB mydir/fileC ...
> seem to be very slow in 0.20 branch. 
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5588) hadoop commands seem extremely slow in 0.20 branch

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-5588:
----------------------------------

    Attachment: globStatus1.patch

This patch fixed a bug in the previous patch.

> hadoop commands seem extremely slow in 0.20 branch
> --------------------------------------------------
>
>                 Key: HADOOP-5588
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5588
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs, fs
>    Affects Versions: 0.20.0
>         Environment: 0.20-branch and trunk
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: globStatus.patch, globStatus1.patch
>
>
> hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc   mydir/fileA mydir/fileB mydir/fileC ...
> seem to be very slow in 0.20 branch. 
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5588) hadoop commands seem extremely slow in 0.20 branch

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12690018#action_12690018 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-5588:
------------------------------------------------

+1 patch looks good.

I tested manually and ran some related tests.  Everything has worked fine.

> hadoop commands seem extremely slow in 0.20 branch
> --------------------------------------------------
>
>                 Key: HADOOP-5588
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5588
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs, fs
>    Affects Versions: 0.20.0
>         Environment: 0.20-branch and trunk
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: globStatus.patch, globStatus1.patch
>
>
> hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc   mydir/fileA mydir/fileB mydir/fileC ...
> seem to be very slow in 0.20 branch. 
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5588) hadoop commands seem extremely slow in 0.20 branch

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koji Noguchi updated HADOOP-5588:
---------------------------------

    Description: 
hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc   mydir/fileA mydir/fileB mydir/fileC ...

seem to be very slow in 0.20 branch. 
 

  was:
hadoop dfs -get/rm/mkdir/etc   mydir/fileA mydir/fileB mydir/fileC ...

seem to be very slow in 0.20 branch. 
 


> hadoop commands seem extremely slow in 0.20 branch
> --------------------------------------------------
>
>                 Key: HADOOP-5588
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5588
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs, fs
>    Affects Versions: 0.20.0
>         Environment: 0.20-branch and trunk
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.20.0
>
>
> hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc   mydir/fileA mydir/fileB mydir/fileC ...
> seem to be very slow in 0.20 branch. 
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5588) hadoop commands seem extremely slow in 0.20 branch

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-5588:
----------------------------------

    Affects Version/s: 0.20.0
        Fix Version/s: 0.20.0
             Assignee: Hairong Kuang

A suspect is HADOOP-3497 which introduced a listing call on the parent directory no matter the path contains globs or not in globStatus. One of our users calls "dfs -get" on many small files under one dir. It has the same effect of calling dfs -ls many times on a large directory, thus causing NN to do lots of gc and making it less responsive.

> hadoop commands seem extremely slow in 0.20 branch
> --------------------------------------------------
>
>                 Key: HADOOP-5588
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5588
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs, fs
>    Affects Versions: 0.20.0
>         Environment: 0.20-branch and trunk
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.20.0
>
>
> hadoop dfs -get/rm/mkdir/etc   mydir/fileA mydir/fileB mydir/fileC ...
> seem to be very slow in 0.20 branch. 
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5588) hadoop commands seem extremely slow in 0.20 branch

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-5588:
----------------------------------

    Attachment: globStatus.patch

This patch restores pre-0.20.0 behavior.

> hadoop commands seem extremely slow in 0.20 branch
> --------------------------------------------------
>
>                 Key: HADOOP-5588
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5588
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs, fs
>    Affects Versions: 0.20.0
>         Environment: 0.20-branch and trunk
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: globStatus.patch
>
>
> hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc   mydir/fileA mydir/fileB mydir/fileC ...
> seem to be very slow in 0.20 branch. 
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5588) hadoop commands seem extremely slow in 0.20 branch

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12690014#action_12690014 ] 

Hairong Kuang commented on HADOOP-5588:
---------------------------------------

Manual tests on dfs -ls/get etc. showed that the patch removed the additional listing call to the parent directory if the input path did not contain a glob.

> hadoop commands seem extremely slow in 0.20 branch
> --------------------------------------------------
>
>                 Key: HADOOP-5588
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5588
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs, fs
>    Affects Versions: 0.20.0
>         Environment: 0.20-branch and trunk
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: globStatus.patch, globStatus1.patch
>
>
> hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc   mydir/fileA mydir/fileB mydir/fileC ...
> seem to be very slow in 0.20 branch. 
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-5588) hadoop commands seem extremely slow in 0.20 branch

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689759#action_12689759 ] 

Hairong Kuang edited comment on HADOOP-5588 at 3/27/09 10:02 AM:
-----------------------------------------------------------------

A suspect is HADOOP-3497 which introduced a listing call on the parent directory no matter the path contains globs or not in globStatus. One of our users calls "dfs -get" on many small files under one large directory. It has the same effect of calling dfs -ls many times on the large directory, thus causing NN to do lots of gc and making it less responsive.

      was (Author: hairong):
    A suspect is HADOOP-3497 which introduced a listing call on the parent directory no matter the path contains globs or not in globStatus. One of our users calls "dfs -get" on many small files under one dir. It has the same effect of calling dfs -ls many times on a large directory, thus causing NN to do lots of gc and making it less responsive.
  
> hadoop commands seem extremely slow in 0.20 branch
> --------------------------------------------------
>
>                 Key: HADOOP-5588
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5588
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs, fs
>    Affects Versions: 0.20.0
>         Environment: 0.20-branch and trunk
>            Reporter: Koji Noguchi
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.20.0
>
>
> hadoop dfs get, rm, -mkdir- ,cp, mv, ls, etc   mydir/fileA mydir/fileB mydir/fileC ...
> seem to be very slow in 0.20 branch. 
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.