You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Alex Kozlov (JIRA)" <ji...@apache.org> on 2010/07/12 08:02:49 UTC

[jira] Created: (HADOOP-6857) FsShell should report raw disk usage including replication factor

FsShell should report raw disk usage including replication factor
-----------------------------------------------------------------

                 Key: HADOOP-6857
                 URL: https://issues.apache.org/jira/browse/HADOOP-6857
             Project: Hadoop Common
          Issue Type: Improvement
          Components: fs
    Affects Versions: 0.20.2
            Reporter: Alex Kozlov


Currently FsShell report HDFS usage with "hadoop fs -dus <path>" command.  Since replication level is per file level, it would be nice to add raw disk usage including the replication factor (maybe "hadoop fs -dus -raw <path>"?).  This will allow to assess resource usage more accurately.  -- Alex K


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6857) FsShell should report raw disk usage including replication factor

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909788#action_12909788 ] 

Eli Collins commented on HADOOP-6857:
-------------------------------------

I agree with Koji, seeing raw disk space usage should be easy to get from the CLI, you shouldn't have to enable a quota on a directory to see it, eg see below that nothing indicates raw disk usage:

{code}
~ $ hadoop fs -mkdir dir
~ $ hadoop fs -put f3mb dir
~ $ hadoop fs -dus dir
hdfs://haus01.sf.cloudera.com:10020/user/eli/dir	3145728
~ $ hadoop fs -count -q dir
        none             inf            none             inf            1            1            3145728 hdfs://haus01.sf.cloudera.com:10020/user/eli/dir
{code}

It also sounds like the original issue here was that disk space usage should not have been added to count if it was already available in dus. Even if we added raw disk usage to "count", it's wonky that a user can logical disk usage with "du" but has to switch over to a separate "count" command to get raw disk usage. Why not make all disk usage available from a single command?

Sound reasonable Nicholas?


> FsShell should report raw disk usage including replication factor
> -----------------------------------------------------------------
>
>                 Key: HADOOP-6857
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6857
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Alex Kozlov
>             Fix For: 0.22.0
>
>         Attachments: show-space-consumed.txt
>
>
> Currently FsShell report HDFS usage with "hadoop fs -dus <path>" command.  Since replication level is per file level, it would be nice to add raw disk usage including the replication factor (maybe "hadoop fs -dus -raw <path>"?).  This will allow to assess resource usage more accurately.  -- Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (HADOOP-6857) FsShell should report raw disk usage including replication factor

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koji Noguchi reopened HADOOP-6857:
----------------------------------


I think this number(raw usage) would be helpful.  Not sure whether this should be in -du or -count and by default or as an option.

> FsShell should report raw disk usage including replication factor
> -----------------------------------------------------------------
>
>                 Key: HADOOP-6857
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6857
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Alex Kozlov
>             Fix For: 0.22.0
>
>         Attachments: show-space-consumed.txt
>
>
> Currently FsShell report HDFS usage with "hadoop fs -dus <path>" command.  Since replication level is per file level, it would be nice to add raw disk usage including the replication factor (maybe "hadoop fs -dus -raw <path>"?).  This will allow to assess resource usage more accurately.  -- Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6857) FsShell should report raw disk usage including replication factor

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909478#action_12909478 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-6857:
------------------------------------------------

"fs -count" was introduced for counting name objects.  The disk space column was added later on.

> FsShell should report raw disk usage including replication factor
> -----------------------------------------------------------------
>
>                 Key: HADOOP-6857
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6857
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Alex Kozlov
>             Fix For: 0.22.0
>
>         Attachments: show-space-consumed.txt
>
>
> Currently FsShell report HDFS usage with "hadoop fs -dus <path>" command.  Since replication level is per file level, it would be nice to add raw disk usage including the replication factor (maybe "hadoop fs -dus -raw <path>"?).  This will allow to assess resource usage more accurately.  -- Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6857) FsShell should report raw disk usage including replication factor

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889481#action_12889481 ] 

Hadoop QA commented on HADOOP-6857:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12449749/show-space-consumed.txt
  against trunk revision 964993.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/622/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/622/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/622/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/622/console

This message is automatically generated.

> FsShell should report raw disk usage including replication factor
> -----------------------------------------------------------------
>
>                 Key: HADOOP-6857
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6857
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.20.2
>            Reporter: Alex Kozlov
>         Attachments: show-space-consumed.txt
>
>
> Currently FsShell report HDFS usage with "hadoop fs -dus <path>" command.  Since replication level is per file level, it would be nice to add raw disk usage including the replication factor (maybe "hadoop fs -dus -raw <path>"?).  This will allow to assess resource usage more accurately.  -- Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6857) FsShell should report raw disk usage including replication factor

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890070#action_12890070 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-6857:
------------------------------------------------

We already have "fs -count <path>" which counts bytes including replications. Is it good enough?

> FsShell should report raw disk usage including replication factor
> -----------------------------------------------------------------
>
>                 Key: HADOOP-6857
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6857
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Alex Kozlov
>             Fix For: 0.22.0
>
>         Attachments: show-space-consumed.txt
>
>
> Currently FsShell report HDFS usage with "hadoop fs -dus <path>" command.  Since replication level is per file level, it would be nice to add raw disk usage including the replication factor (maybe "hadoop fs -dus -raw <path>"?).  This will allow to assess resource usage more accurately.  -- Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6857) FsShell should report raw disk usage including replication factor

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins updated HADOOP-6857:
--------------------------------

         Hadoop Flags: [Incompatible change]
        Fix Version/s: 0.22.0
    Affects Version/s:     (was: 0.20.2)

Hey Aaron,

Patch looks good.  Mind creating o.a.h.fs.TestFsShell.java and adding a test that shows files with two different replication levels works? (might need to mock up the replication level).  Also, please test that TestShell and TestHDFSCLI in HDFS still pass for sanity.

Wrt to rationale I think this change is kosher since FileStatus#getReplication is not hdfs-specific. 

Marking the jira as an incompatible change since IIRC the FsShell is considered a public API.

Thanks,
Eli

> FsShell should report raw disk usage including replication factor
> -----------------------------------------------------------------
>
>                 Key: HADOOP-6857
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6857
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Alex Kozlov
>             Fix For: 0.22.0
>
>         Attachments: show-space-consumed.txt
>
>
> Currently FsShell report HDFS usage with "hadoop fs -dus <path>" command.  Since replication level is per file level, it would be nice to add raw disk usage including the replication factor (maybe "hadoop fs -dus -raw <path>"?).  This will allow to assess resource usage more accurately.  -- Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6857) FsShell should report raw disk usage including replication factor

Posted by "Aaron T. Myers (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron T. Myers updated HADOOP-6857:
-----------------------------------

    Status: Patch Available  (was: Open)

> FsShell should report raw disk usage including replication factor
> -----------------------------------------------------------------
>
>                 Key: HADOOP-6857
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6857
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.20.2
>            Reporter: Alex Kozlov
>         Attachments: show-space-consumed.txt
>
>
> Currently FsShell report HDFS usage with "hadoop fs -dus <path>" command.  Since replication level is per file level, it would be nice to add raw disk usage including the replication factor (maybe "hadoop fs -dus -raw <path>"?).  This will allow to assess resource usage more accurately.  -- Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6857) FsShell should report raw disk usage including replication factor

Posted by "Maxim Veksler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909693#action_12909693 ] 

Maxim Veksler commented on HADOOP-6857:
---------------------------------------

Is there an easy cli method of obtaining this information? 

If not then this can be a welcome feature in several use cases (Like file server based on HDFS)

> FsShell should report raw disk usage including replication factor
> -----------------------------------------------------------------
>
>                 Key: HADOOP-6857
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6857
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Alex Kozlov
>             Fix For: 0.22.0
>
>         Attachments: show-space-consumed.txt
>
>
> Currently FsShell report HDFS usage with "hadoop fs -dus <path>" command.  Since replication level is per file level, it would be nice to add raw disk usage including the replication factor (maybe "hadoop fs -dus -raw <path>"?).  This will allow to assess resource usage more accurately.  -- Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HADOOP-6857) FsShell should report raw disk usage including replication factor

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE resolved HADOOP-6857.
--------------------------------------------

    Resolution: Won't Fix

Closing this.  Thanks.

> FsShell should report raw disk usage including replication factor
> -----------------------------------------------------------------
>
>                 Key: HADOOP-6857
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6857
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Alex Kozlov
>             Fix For: 0.22.0
>
>         Attachments: show-space-consumed.txt
>
>
> Currently FsShell report HDFS usage with "hadoop fs -dus <path>" command.  Since replication level is per file level, it would be nice to add raw disk usage including the replication factor (maybe "hadoop fs -dus -raw <path>"?).  This will allow to assess resource usage more accurately.  -- Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6857) FsShell should report raw disk usage including replication factor

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909502#action_12909502 ] 

Koji Noguchi commented on HADOOP-6857:
--------------------------------------

A little confused.  I thought "fs -count" shows hdfs usage same as "fs -du" at the thrid column.

{noformat}
[knoguchi ~]$ hadoop dfs -dus /user/knoguchi
hdfs://abc-nn1.com/user/knoguchi       2603203340273
[knoguchi ~]$ hadoop dfs -count /user/knoguchi
        1580        20624      2603203340273 hdfs://abc-nn1.com/user/knoguchi
[knoguchi ~]$ 
{noformat}
If quota is enabled on that dir and "-q" is passed, it would show the remaining raw space available. 
{noformat}
[knoguchi ~]$ hadoop dfs -count -q /user/knoguchi
       50000           27796  13194139533312   5384528402193         1580        20624      2603203340273 hdfs://abc-nn1.com/user/knoguchi
[knoguchi ~]$ 
{noformat}
You can get the raw space usage then. (quota - raw\_remaining). 
However *this is only if you have quota enabled on that particular dir*.


> FsShell should report raw disk usage including replication factor
> -----------------------------------------------------------------
>
>                 Key: HADOOP-6857
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6857
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Alex Kozlov
>             Fix For: 0.22.0
>
>         Attachments: show-space-consumed.txt
>
>
> Currently FsShell report HDFS usage with "hadoop fs -dus <path>" command.  Since replication level is per file level, it would be nice to add raw disk usage including the replication factor (maybe "hadoop fs -dus -raw <path>"?).  This will allow to assess resource usage more accurately.  -- Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6857) FsShell should report raw disk usage including replication factor

Posted by "Aaron T. Myers (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron T. Myers updated HADOOP-6857:
-----------------------------------

    Attachment: show-space-consumed.txt

This patch adds a new column to the output of "hadoop fs -du" and "hadoop fs -dus" which shows the disk space consumed (file size * per-file replication factor) of the paths matched by these commands.

> FsShell should report raw disk usage including replication factor
> -----------------------------------------------------------------
>
>                 Key: HADOOP-6857
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6857
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.20.2
>            Reporter: Alex Kozlov
>         Attachments: show-space-consumed.txt
>
>
> Currently FsShell report HDFS usage with "hadoop fs -dus <path>" command.  Since replication level is per file level, it would be nice to add raw disk usage including the replication factor (maybe "hadoop fs -dus -raw <path>"?).  This will allow to assess resource usage more accurately.  -- Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6857) FsShell should report raw disk usage including replication factor

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jakob Homan updated HADOOP-6857:
--------------------------------

    Status: Open  (was: Patch Available)

Canceling patch until Nicholas' comments are addressed.

> FsShell should report raw disk usage including replication factor
> -----------------------------------------------------------------
>
>                 Key: HADOOP-6857
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6857
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Alex Kozlov
>             Fix For: 0.22.0
>
>         Attachments: show-space-consumed.txt
>
>
> Currently FsShell report HDFS usage with "hadoop fs -dus <path>" command.  Since replication level is per file level, it would be nice to add raw disk usage including the replication factor (maybe "hadoop fs -dus -raw <path>"?).  This will allow to assess resource usage more accurately.  -- Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6857) FsShell should report raw disk usage including replication factor

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909472#action_12909472 ] 

Eli Collins commented on HADOOP-6857:
-------------------------------------

Why do we have separate "dus" and "count" commands?  They seem to duplicate each other.  Since the CLI needs to be backwards compatible not suggesting we remove one, but perhaps we should update the hdfs_shell docs to make it clear that they display the same info if there are not significant differences.

{code}
~ $ hadoop fs -dus /user/eli
hdfs://haus01.sf.cloudera.com:10020/user/eli	86183666860
{code}

{code}
~ $ hadoop fs -count /user/eli
           7           51        86183666860 hdfs://haus01.sf.cloudera.com:10020/user/eli
~ $ 
{code}

> FsShell should report raw disk usage including replication factor
> -----------------------------------------------------------------
>
>                 Key: HADOOP-6857
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6857
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Alex Kozlov
>             Fix For: 0.22.0
>
>         Attachments: show-space-consumed.txt
>
>
> Currently FsShell report HDFS usage with "hadoop fs -dus <path>" command.  Since replication level is per file level, it would be nice to add raw disk usage including the replication factor (maybe "hadoop fs -dus -raw <path>"?).  This will allow to assess resource usage more accurately.  -- Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6857) FsShell should report raw disk usage including replication factor

Posted by "Aaron T. Myers (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909368#action_12909368 ] 

Aaron T. Myers commented on HADOOP-6857:
----------------------------------------

bq. We already have "fs -count <path>" which counts bytes including replications. Is it good enough?

"fs -count <path>" is indeed sufficient. Feel free to close this ticket.

> FsShell should report raw disk usage including replication factor
> -----------------------------------------------------------------
>
>                 Key: HADOOP-6857
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6857
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Alex Kozlov
>             Fix For: 0.22.0
>
>         Attachments: show-space-consumed.txt
>
>
> Currently FsShell report HDFS usage with "hadoop fs -dus <path>" command.  Since replication level is per file level, it would be nice to add raw disk usage including the replication factor (maybe "hadoop fs -dus -raw <path>"?).  This will allow to assess resource usage more accurately.  -- Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.