You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Hairong Kuang (JIRA)" <ji...@apache.org> on 2010/04/27 01:35:33 UTC

[jira] Commented: (HADOOP-6678) Remove FileContext#isFile, isDirectory and exists

    [ https://issues.apache.org/jira/browse/HADOOP-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861169#action_12861169 ] 

Hairong Kuang commented on HADOOP-6678:
---------------------------------------

Hi Eli, sorry for the delay for getting back to this issue. I was on vacation & worked on a cluster emergency after I came back.

But I got some time to think more about this proposed FileContext change. I am quite concerned about removing FileContext#exists. This is a useful often-used method that's not easy for a user to write. Do you think if it is a good idea that we still keep this method but put it in FileContext#Util with a comment saying that "be cautious about using this method." and list a few scenarios that could avoid calling this method? 

> Remove FileContext#isFile, isDirectory and exists
> -------------------------------------------------
>
>                 Key: HADOOP-6678
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6678
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Hairong Kuang
>            Assignee: Eli Collins
>             Fix For: 0.21.0, 0.22.0
>
>         Attachments: hadoop-6678-1.patch
>
>
> # Add a method  Iterator<FileStatus> listStatus(Path), which allows HDFS client not to have the whole listing in the memory, benefit more from the iterative listing added in HDFS-985. Move the current FileStatus[] listStatus(Path) to be a utility method.
> # Remove methods isFile(Path), isDirectory(Path), and exists.
> All these methods are implemented by calling getFileStatus(Path).But most users are not aware of this. They would write code as below: 
> {code}
>   FileContext fc = ..;
>   if (fc.exists(path)) {
>     if (fc.isFile(path)) {
>      ...
>     } else {
>     ...
>     }
>   }
> {code}
> The above code adds unnecessary getFileInfo RPC to NameNode. In our production clusters, we often see that the number of getFileStatus calls is multiple times of the open calls. If we remove isFile, isDirectory, and exists from FileContext, users have to explicitly call getFileStatus first, it is more likely that they will write more efficient code as follow:
> {code}
>   FileContext fc = ...;
>   FileStatus fstatus = fc.getFileStatus(path);
>   if (fstatus.isFile() {
>     ...
>   } else {
>     ...
>   }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.