You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Kannan Rajah (JIRA)" <ji...@apache.org> on 2015/05/03 03:49:13 UTC

[jira] [Updated] (HADOOP-11905) Abstraction for LocalDirAllocator

     [ https://issues.apache.org/jira/browse/HADOOP-11905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kannan Rajah updated HADOOP-11905:
----------------------------------
    Attachment: 0001-Abstraction-for-local-disk-path-allocation.patch

> Abstraction for LocalDirAllocator
> ---------------------------------
>
>                 Key: HADOOP-11905
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11905
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.5.2
>            Reporter: Kannan Rajah
>            Assignee: Kannan Rajah
>             Fix For: 2.7.1
>
>         Attachments: 0001-Abstraction-for-local-disk-path-allocation.patch
>
>
> There are 2 abstractions used to write data to local disk.
> LocalDirAllocator: Allocate paths from a set of configured local directories.
> LocalFileSystem/RawLocalFileSystem: Read/write using java.io.* and java.nio.*
> In the current implementation, local disk is managed by guest OS and not HDFS. The proposal is to provide a new abstraction that encapsulates the above 2 abstractions and hides who manages the local disks. This enables us to provide an alternate implementation where a DFS can manage the local disks and it can be accessed using HDFS APIs. This means the DFS maintains a namespace for node local directories and can create paths that are guaranteed to be present on a specific node.
> Here is an example use case for Shuffle: When a mapper writes intermediate data using this new implementation, it will continue write to local disk. When a reducer needs to access data from a remote node, it can use HDFS APIs with a path that points to that node’s local namespace instead of having to use HTTP server to transfer the data across nodes.
> New Abstractions
> 1. LocalDiskPathAllocator
> Interface to get file/directory paths from the local disk namespace.
> This contains all the APIs that are currently supported by LocalDirAllocator. So we just need to change LocalDirAllocator to implement this new interface.
> 2. LocalDiskUtil
> Helper class to get a handle to LocalDiskPathAllocator and the FileSystem
> that is used to manage those paths.
> By default, it will return LocalDirAllocator and LocalFileSystem.
> A supporting DFS can return DFSLocalDirAllocator and an instance of DFS.
> 3. DFSLocalDirAllocator
> This is a generic implementation. An allocator is created for a specific node. It uses Configuration object to get user configured base directory and appends the node hostname to it. Hence the returned paths are within the node local namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)