You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Qi Liu (JIRA)" <ji...@apache.org> on 2009/01/30 00:31:59 UTC

[jira] Created: (HADOOP-5143) Add a method to get file length for Seekable, FSDataInputStream and libhdfs

Add a method to get file length for Seekable, FSDataInputStream and libhdfs
---------------------------------------------------------------------------

                 Key: HADOOP-5143
                 URL: https://issues.apache.org/jira/browse/HADOOP-5143
             Project: Hadoop Core
          Issue Type: New Feature
          Components: dfs, fs, libhdfs
    Affects Versions: 0.18.2
            Reporter: Qi Liu


When open any seekable file, it should be able to get the length of the file via Seekable interface, since the seek method should be able to detect seeking beyond the end of file. Such interface can benefit distributed file systems by saving a network round-trip of FileSystem.getFileStatus(Path).getLen() for any open file.
In libhdfs, such interface should also be exposed to make native program taking advantage of this change.
I have the changes locally for all FSInputStream concrete classes. The change can be considered trivial, since some of the FSInputStream classes already have a method named getFileLength(), or a member field named size/length/end.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5143) Add a method to get file length for Seekable, FSDataInputStream and libhdfs

Posted by "Qi Liu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Qi Liu updated HADOOP-5143:
---------------------------

    Attachment: HADOOP-5143-2.patch

Moved getFileLength() out of Seekable, into FSInputStream and FSDataInputStream.

> Add a method to get file length for Seekable, FSDataInputStream and libhdfs
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-5143
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5143
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs, fs, libhdfs
>    Affects Versions: 0.18.2
>            Reporter: Qi Liu
>         Attachments: HADOOP-5143-2.patch, HADOOP-5143.patch, hadoop.patch
>
>
> When open any seekable file, it should be able to get the length of the file via Seekable interface, since the seek method should be able to detect seeking beyond the end of file. Such interface can benefit distributed file systems by saving a network round-trip of FileSystem.getFileStatus(Path).getLen() for any open file.
> In libhdfs, such interface should also be exposed to make native program taking advantage of this change.
> I have the changes locally for all FSInputStream concrete classes. The change can be considered trivial, since some of the FSInputStream classes already have a method named getFileLength(), or a member field named size/length/end.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5143) Add a method to get file length for Seekable, FSDataInputStream and libhdfs

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698952#action_12698952 ] 

Raghu Angadi commented on HADOOP-5143:
--------------------------------------

I don't see any need to add more not-so-related methods in interfaces. getLength() is already available through various other calls. Seekable just implies users can call seek(). Adding other utility stuff here does not seem very useful.

What is the use case?

> Add a method to get file length for Seekable, FSDataInputStream and libhdfs
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-5143
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5143
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs, fs, libhdfs
>    Affects Versions: 0.18.2
>            Reporter: Qi Liu
>         Attachments: HADOOP-5143.patch, hadoop.patch
>
>
> When open any seekable file, it should be able to get the length of the file via Seekable interface, since the seek method should be able to detect seeking beyond the end of file. Such interface can benefit distributed file systems by saving a network round-trip of FileSystem.getFileStatus(Path).getLen() for any open file.
> In libhdfs, such interface should also be exposed to make native program taking advantage of this change.
> I have the changes locally for all FSInputStream concrete classes. The change can be considered trivial, since some of the FSInputStream classes already have a method named getFileLength(), or a member field named size/length/end.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5143) Add a method to get file length for Seekable, FSDataInputStream and libhdfs

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12698337#action_12698337 ] 

dhruba borthakur commented on HADOOP-5143:
------------------------------------------

This is a good change! 

Can you pl merge this patch with trunk and attach a new diff file? Also, it would be nice if you can generate the patch from the base of the workspace (as described in http://wiki.apache.org/hadoop/HowToContribute). If you can add a unit test to (possibly add to TestFileCreation.java) that would be great.

> Add a method to get file length for Seekable, FSDataInputStream and libhdfs
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-5143
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5143
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs, fs, libhdfs
>    Affects Versions: 0.18.2
>            Reporter: Qi Liu
>         Attachments: hadoop.patch
>
>
> When open any seekable file, it should be able to get the length of the file via Seekable interface, since the seek method should be able to detect seeking beyond the end of file. Such interface can benefit distributed file systems by saving a network round-trip of FileSystem.getFileStatus(Path).getLen() for any open file.
> In libhdfs, such interface should also be exposed to make native program taking advantage of this change.
> I have the changes locally for all FSInputStream concrete classes. The change can be considered trivial, since some of the FSInputStream classes already have a method named getFileLength(), or a member field named size/length/end.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5143) Add a method to get file length for Seekable, FSDataInputStream and libhdfs

Posted by "Qi Liu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704338#action_12704338 ] 

Qi Liu commented on HADOOP-5143:
--------------------------------

available() in Hadoop 0.18.3 will report negative numbers if the file size is over 2GB, which obviously is a bug. available() should always return a number greater or equal to 0, agree?

> Add a method to get file length for Seekable, FSDataInputStream and libhdfs
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-5143
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5143
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs, fs, libhdfs
>    Affects Versions: 0.18.2
>            Reporter: Qi Liu
>         Attachments: HADOOP-5143-2.patch, HADOOP-5143.patch, hadoop.patch
>
>
> When open any seekable file, it should be able to get the length of the file via Seekable interface, since the seek method should be able to detect seeking beyond the end of file. Such interface can benefit distributed file systems by saving a network round-trip of FileSystem.getFileStatus(Path).getLen() for any open file.
> In libhdfs, such interface should also be exposed to make native program taking advantage of this change.
> I have the changes locally for all FSInputStream concrete classes. The change can be considered trivial, since some of the FSInputStream classes already have a method named getFileLength(), or a member field named size/length/end.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5143) Add a method to get file length for Seekable, FSDataInputStream and libhdfs

Posted by "Qi Liu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Qi Liu updated HADOOP-5143:
---------------------------

    Attachment: hadoop.patch

Attaching a patch to make the getFileLength interface public in both Java and libhdfs.

> Add a method to get file length for Seekable, FSDataInputStream and libhdfs
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-5143
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5143
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs, fs, libhdfs
>    Affects Versions: 0.18.2
>            Reporter: Qi Liu
>         Attachments: hadoop.patch
>
>
> When open any seekable file, it should be able to get the length of the file via Seekable interface, since the seek method should be able to detect seeking beyond the end of file. Such interface can benefit distributed file systems by saving a network round-trip of FileSystem.getFileStatus(Path).getLen() for any open file.
> In libhdfs, such interface should also be exposed to make native program taking advantage of this change.
> I have the changes locally for all FSInputStream concrete classes. The change can be considered trivial, since some of the FSInputStream classes already have a method named getFileLength(), or a member field named size/length/end.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5143) Add a method to get file length for Seekable, FSDataInputStream and libhdfs

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704424#action_12704424 ] 

Hong Tang commented on HADOOP-5143:
-----------------------------------

Agreed. Let's return min(Integer.MAX_VALUE, length).


> Add a method to get file length for Seekable, FSDataInputStream and libhdfs
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-5143
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5143
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs, fs, libhdfs
>    Affects Versions: 0.18.2
>            Reporter: Qi Liu
>         Attachments: HADOOP-5143-2.patch, HADOOP-5143.patch, hadoop.patch
>
>
> When open any seekable file, it should be able to get the length of the file via Seekable interface, since the seek method should be able to detect seeking beyond the end of file. Such interface can benefit distributed file systems by saving a network round-trip of FileSystem.getFileStatus(Path).getLen() for any open file.
> In libhdfs, such interface should also be exposed to make native program taking advantage of this change.
> I have the changes locally for all FSInputStream concrete classes. The change can be considered trivial, since some of the FSInputStream classes already have a method named getFileLength(), or a member field named size/length/end.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5143) Add a method to get file length for Seekable, FSDataInputStream and libhdfs

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699194#action_12699194 ] 

dhruba borthakur commented on HADOOP-5143:
------------------------------------------

Another use case: if one opens a file for reading (via FSDataInputStream) and then wants to find the length of the file (without making a separate FileStatus RPC to the namanode)

> Add a method to get file length for Seekable, FSDataInputStream and libhdfs
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-5143
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5143
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs, fs, libhdfs
>    Affects Versions: 0.18.2
>            Reporter: Qi Liu
>         Attachments: HADOOP-5143.patch, hadoop.patch
>
>
> When open any seekable file, it should be able to get the length of the file via Seekable interface, since the seek method should be able to detect seeking beyond the end of file. Such interface can benefit distributed file systems by saving a network round-trip of FileSystem.getFileStatus(Path).getLen() for any open file.
> In libhdfs, such interface should also be exposed to make native program taking advantage of this change.
> I have the changes locally for all FSInputStream concrete classes. The change can be considered trivial, since some of the FSInputStream classes already have a method named getFileLength(), or a member field named size/length/end.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5143) Add a method to get file length for Seekable, FSDataInputStream and libhdfs

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702868#action_12702868 ] 

dhruba borthakur commented on HADOOP-5143:
------------------------------------------

Hi Qi, is it possible for you to add the new method getFileLength() only to FSInputStream and libhdfs (and not to the Seekable) interface. As Raghu points out, the getFileLength() API does not seem to match with the goals of the Seekable interface. 

You mentioned about the shortcomings about the available() interface for files greater than 2GB. is this something that is fixed in later releases of the JDK?

> Add a method to get file length for Seekable, FSDataInputStream and libhdfs
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-5143
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5143
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs, fs, libhdfs
>    Affects Versions: 0.18.2
>            Reporter: Qi Liu
>         Attachments: HADOOP-5143.patch, hadoop.patch
>
>
> When open any seekable file, it should be able to get the length of the file via Seekable interface, since the seek method should be able to detect seeking beyond the end of file. Such interface can benefit distributed file systems by saving a network round-trip of FileSystem.getFileStatus(Path).getLen() for any open file.
> In libhdfs, such interface should also be exposed to make native program taking advantage of this change.
> I have the changes locally for all FSInputStream concrete classes. The change can be considered trivial, since some of the FSInputStream classes already have a method named getFileLength(), or a member field named size/length/end.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5143) Add a method to get file length for Seekable, FSDataInputStream and libhdfs

Posted by "Qi Liu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699005#action_12699005 ] 

Qi Liu commented on HADOOP-5143:
--------------------------------

Simple. What if I want to seek relative to the end of a file? Also, it is reasonable to have a method which gives the boundaries where calling seek will not cause exceptions.
If the file size is less than 2G, available() would do the job. However, in many FSInputStream, available() is not working properly, and even possible to give me negative values if the file size exceeds 2G.
What I indeed want, is an available() which can give me a value larger than 2G (long). If such interface exists, the file length can be obtained by seek(0); availableLong();


> Add a method to get file length for Seekable, FSDataInputStream and libhdfs
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-5143
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5143
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs, fs, libhdfs
>    Affects Versions: 0.18.2
>            Reporter: Qi Liu
>         Attachments: HADOOP-5143.patch, hadoop.patch
>
>
> When open any seekable file, it should be able to get the length of the file via Seekable interface, since the seek method should be able to detect seeking beyond the end of file. Such interface can benefit distributed file systems by saving a network round-trip of FileSystem.getFileStatus(Path).getLen() for any open file.
> In libhdfs, such interface should also be exposed to make native program taking advantage of this change.
> I have the changes locally for all FSInputStream concrete classes. The change can be considered trivial, since some of the FSInputStream classes already have a method named getFileLength(), or a member field named size/length/end.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5143) Add a method to get file length for Seekable, FSDataInputStream and libhdfs

Posted by "Qi Liu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Qi Liu updated HADOOP-5143:
---------------------------

    Attachment: HADOOP-5143.patch

The patch against Hadoop 0.21-dev trunk

> Add a method to get file length for Seekable, FSDataInputStream and libhdfs
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-5143
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5143
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs, fs, libhdfs
>    Affects Versions: 0.18.2
>            Reporter: Qi Liu
>         Attachments: HADOOP-5143.patch, hadoop.patch
>
>
> When open any seekable file, it should be able to get the length of the file via Seekable interface, since the seek method should be able to detect seeking beyond the end of file. Such interface can benefit distributed file systems by saving a network round-trip of FileSystem.getFileStatus(Path).getLen() for any open file.
> In libhdfs, such interface should also be exposed to make native program taking advantage of this change.
> I have the changes locally for all FSInputStream concrete classes. The change can be considered trivial, since some of the FSInputStream classes already have a method named getFileLength(), or a member field named size/length/end.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5143) Add a method to get file length for Seekable, FSDataInputStream and libhdfs

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699248#action_12699248 ] 

Raghu Angadi commented on HADOOP-5143:
--------------------------------------

I am not saying getLength() is not useful. It is just that it does not need to be part of Seekable.

> Add a method to get file length for Seekable, FSDataInputStream and libhdfs
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-5143
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5143
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs, fs, libhdfs
>    Affects Versions: 0.18.2
>            Reporter: Qi Liu
>         Attachments: HADOOP-5143.patch, hadoop.patch
>
>
> When open any seekable file, it should be able to get the length of the file via Seekable interface, since the seek method should be able to detect seeking beyond the end of file. Such interface can benefit distributed file systems by saving a network round-trip of FileSystem.getFileStatus(Path).getLen() for any open file.
> In libhdfs, such interface should also be exposed to make native program taking advantage of this change.
> I have the changes locally for all FSInputStream concrete classes. The change can be considered trivial, since some of the FSInputStream classes already have a method named getFileLength(), or a member field named size/length/end.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.