You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Sameer Paranjpye (JIRA)" <ji...@apache.org> on 2006/03/07 00:06:02 UTC

[jira] Created: (HADOOP-64) DataNode should be capable of managing multiple volumes

DataNode should be capable of managing multiple volumes
-------------------------------------------------------

         Key: HADOOP-64
         URL: http://issues.apache.org/jira/browse/HADOOP-64
     Project: Hadoop
        Type: Improvement
  Components: dfs  
    Versions: 0.2    
    Reporter: Sameer Paranjpye
    Priority: Minor
     Fix For: 0.2


The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.

The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Assigned: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-64?page=all ]

Sameer Paranjpye reassigned HADOOP-64:
--------------------------------------

    Assign To: Konstantin Shvachko

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>          Key: HADOOP-64
>          URL: http://issues.apache.org/jira/browse/HADOOP-64
>      Project: Hadoop
>         Type: Improvement
>   Components: dfs
>     Versions: 0.2
>     Reporter: Sameer Paranjpye
>     Assignee: Konstantin Shvachko
>     Priority: Minor
>      Fix For: 0.2

>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12371494 ] 

Doug Cutting commented on HADOOP-64:
------------------------------------

I am not convinced, w/o benchmarks, that this is required.  Multiple datanodes in the same JVM (as currently implemented) share a single TCP connection to the namenode.  However each currently sends separate heartbeats to the namenode.  Thus the primary impact of the proposed change would be that these heartbeats could be combined into a single RPC.  The processing on the server would be the same, only spread over fewer RPC calls.  So this change is primarily warranted if heartbeat RPC overhead dominates namenode performance.  Even if that's proven to be the case, then we can achieve a similar effect much more simply by increasing the heartbeat interval.


> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>          Key: HADOOP-64
>          URL: http://issues.apache.org/jira/browse/HADOOP-64
>      Project: Hadoop
>         Type: Improvement
>   Components: dfs
>     Versions: 0.2
>     Reporter: Sameer Paranjpye
>     Assignee: Konstantin Shvachko
>     Priority: Minor
>      Fix For: 0.2

>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12426354 ] 
            
Milind Bhandarkar commented on HADOOP-64:
-----------------------------------------

About check-pointing datanodes. I agree that it is a needless complexity. I was confused about this as well. But as Konstantin pointed out to me, the datanode checkpointing proposal is NOT checkpointing datanodes' state, but checkpointing datanodes' blockreport in the namenode checkpoint. Thanks konstantin.

As the proposal (and the implementation) currently stands, if dfs.data.dir is read-only, the datanode reports to be dead, since block-delete etc operations cannot be carried out on it. The namenode treats that datanode as dead, and tries to re-replicate its blocks on other data nodes. The same behavior will continue, except the datanode will not report itself to be dead if at least one volume specified in the dfs.data.dir list is read-write. However, it will not report blocks contained in read-only volumes.

Storage-ID continues to be one per datanode. Putting blocks in different volumes is datanode-internal.

The DF.java contains code to detect mount. This will be considered to be the differentiation between different disks. Even if it is not right, it does not preclude correct operation of datanode, only performance is affected. Performance will be maximized if all volumes specified in dfs.data.dir are located on different local disks.

Making read-only mounts visible on namenode is an orthogonal issue. My proposal specifies a backward-compatible way of dealing with it.

Using the last x bits to map a block on local directory will minimize datanode's state as well as keep the directory size minimal (since block-ids are random). Consider it an implicit hashtable on disk.

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-64?page=all ]

Milind Bhandarkar updated HADOOP-64:
------------------------------------

    Status: Patch Available  (was: Open)

Patch attached.

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>         Attachments: multiple-volumes.patch
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-64?page=all ]

Doug Cutting updated HADOOP-64:
-------------------------------

    Fix Version: 0.5.0
                     (was: 0.4.0)

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>          Key: HADOOP-64
>          URL: http://issues.apache.org/jira/browse/HADOOP-64
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.2.0
>     Reporter: Sameer Paranjpye
>     Assignee: Konstantin Shvachko
>     Priority: Minor
>      Fix For: 0.5.0

>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-64?page=all ]

Doug Cutting updated HADOOP-64:
-------------------------------

    Fix Version: 0.3
                     (was: 0.2)

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>          Key: HADOOP-64
>          URL: http://issues.apache.org/jira/browse/HADOOP-64
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.2
>     Reporter: Sameer Paranjpye
>     Assignee: Konstantin Shvachko
>     Priority: Minor
>      Fix For: 0.3

>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "Yoram Arnon (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12426366 ] 
            
Yoram Arnon commented on HADOOP-64:
-----------------------------------

dfs.data.dir is currently used to specify the location of temporary files written by dfs client (data is written to disk, then an entire dfs block is streamed to the datanodes). Rather than trying to support a multiple-volume behaviour there too, let's separate the client config from the datanode config, using 'client.tempdata.dir'. Try to make the change backwards compatible.

read-only drives are hard to maintain except by totally ignoring them, since data can not be deleted from them. If a file is deleted, then a blockid is reclaimed for another file, bad things might happen if that blockid is served by some read-only volume. If it's the last copy of a block, *and* the volume is read-only and on its way to be dead, then that block is unfortunately lost.

round robin is a bit harsh as an allocation scheme. allocation proportional to free space would work better IMO.

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12426393 ] 
            
Konstantin Shvachko commented on HADOOP-64:
-------------------------------------------

 = I believe there was a misunderstanding on the datanode checkpointing issue.
HADOOP-306 proposes to checkpoint only the list of datanodes, effectively DatanodeInfo.
It was not meant to store the datanode block reports.
The block map is not and should not be checkpointed.

= DF on Windows will return the drive letter, which can be used to distinguish disks.
It will work only for local disks though. Mounted (mapped network) drives on Windows won't work.

= I agree storageID should be the same per node. It will need to be stored separately on each drive.
Otherwise, if only one drive stores the id and gets corrupted we will not be able to restore the
storage id for other drives. Also, the storage files on each drive should be locked when the datanode
starts to prevent from running multiple data nodes with the same blocks.

= It is a good idea that the number of directories is a power of 2.
But I do not support the idea to reserve any number of bits of block-id to determine block locations, for 2 reasons.
a) Block replicas can have different locations on different data nodes.
b) The block id is issued by the namenode, and it is not good if the namenode will need to
know about a datanode storage setup.
Instead, we can partition bit representation of the block id into a number of parts consistent
with the number of directories and e.g. XOR them. The result will represent the directory name.
I think this will be random enough.

= I don't think the datanode can be even start on a read-only disk.
The storage file won't open.

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12371504 ] 

Sameer Paranjpye commented on HADOOP-64:
----------------------------------------

Heartbeat overhead aside, a second motivator for this change is the simplification of block placement logic. The namenode wouldn't have to track the datanode-machine mapping to ensure that replicas of a block don't end up on the same box. I don't know if this is currently done, but if not it could be a problem. Also, if block placement decisions are to be made based on some measure of load (number of blocks in being read/written, CPU usage etc.) the load needs to be tracked at a machine level. It's simpler to note have the namenode compute the aggregate load per machine.

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>          Key: HADOOP-64
>          URL: http://issues.apache.org/jira/browse/HADOOP-64
>      Project: Hadoop
>         Type: Improvement
>   Components: dfs
>     Versions: 0.2
>     Reporter: Sameer Paranjpye
>     Assignee: Konstantin Shvachko
>     Priority: Minor
>      Fix For: 0.2

>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12420592 ] 

Doug Cutting commented on HADOOP-64:
------------------------------------

The block placement algorithm does check that copies are not placed on nodes on the same host.

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>          Key: HADOOP-64
>          URL: http://issues.apache.org/jira/browse/HADOOP-64
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.2.0
>     Reporter: Sameer Paranjpye
>     Assignee: Konstantin Shvachko
>     Priority: Minor
>      Fix For: 0.5.0

>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "Yoram Arnon (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12426452 ] 
            
Yoram Arnon commented on HADOOP-64:
-----------------------------------

for more discussion on the read-only disk issue, see the discussion in HADOOP-163

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12432684 ] 
            
Doug Cutting commented on HADOOP-64:
------------------------------------

I forgot to name the bug in the commit message.  So, for the record, the commit was:
  http://svn.apache.org/viewvc?view=rev&revision=440508

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>         Attachments: multiple-volumes.patch
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12426338 ] 
            
Milind Bhandarkar commented on HADOOP-64:
-----------------------------------------

Proposal:

In the configuration (e.g. hadoop-site.xml), site-admin can specify a comma separated list of volumes as a value corresponding to key "dfs.data.dir". These volumes are assumed to be mounted on different disks. Thus the total disk capacity for the datanode is assumed to be a sum of disk capacities of these volumes, in addition, taking into account the /dev/sda* or /dev/hda* mapping of these volumes (i.e. not counting the same /dev/* twice.)

New blocks are created round-robin in these volumes. The policy for block-allocation is controlled by a separable piece of code, so that different policies can be substituted at runtime later. Mapping of datablocks to volume-id is kept in memory of datanode. When the datanode comes up again, it discovers this mapping by reading specified volumes. Later, when datanode is also periodically checkpointed, this mapping is stored in the checkpoint as well.

Each volume is further automatically split into multiple subdirectories (the number of these directories is configurable, and should be a power of 2, so that the last x bits of a block-id is used to determine which subdirectory the block is stored in. this is the scheme used in Mike's patch for hadoop-50.

If a datanode is re-configured with different number (or locations) of volumes for dfs.data.dir, the blocks stored in earlier locations are considered by the datanode to be lost (when in future, the datanode is checkpointed, it will try to recover those "lost" blocks). If one of the volumes is read-only, it will currently be considered to be dead only with-respect-to that volume. i.e. it will still continue to store blocks in read-write volumes, but blocks in the read-only volumes will be considered lost, since they cannot be deleted.)

Please comment on this proposal asap, so that I can go ahead with implementation.


> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12427595 ] 
            
Doug Cutting commented on HADOOP-64:
------------------------------------

Since we only expect to have around 10k blocks per node, storing the table in memory should not be a problem.  With even 100k blockids per nodem, at 100bytes of RAM per blockid, a datanode would only require 10MB.  So optimizing this seems premature.

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "Yoram Arnon (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12426770 ] 
            
Yoram Arnon commented on HADOOP-64:
-----------------------------------

We could consider the datanode doing some kind of best effort if one volume is read-only, by copying that volume's data locally to the other volumes before shutting down the bad volume. Best effort only - if there isn't enough space for everything then just copy what you can and move on. It may decrease the number of blocks that need to be replicated over the network.

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Assigned: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-64?page=all ]

Milind Bhandarkar reassigned HADOOP-64:
---------------------------------------

    Assignee: Milind Bhandarkar  (was: Konstantin Shvachko)

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12426300 ] 
            
Milind Bhandarkar commented on HADOOP-64:
-----------------------------------------

I have profiled the namenode, and most action in namenode happens as a response to heartbeat and blockreport messages. Therefore I think it is important to enable the datanode to handle multiple volumes. this also relates to hadoop-50, which needs handling multiple directories. The scheme I have in mind is for datanode to load-balance among volumes (that corrspond to multiple disks) and then within a volume, block-placement will be done within multiple directories according to block-id. I am currently preparing a proposal on this issue.

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Konstantin Shvachko
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "Johan Oskarson (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12420585 ] 

Johan Oskarson commented on HADOOP-64:
--------------------------------------

Noticed there's an unanswered question in the comment above.
How well does hadoop handle multiple volumes?
Since it starts one datanode per volume, is there a risk (even though it's a small risk) that a file with replication level 3 might end up with one block on one node, if said node have >=3 volumes?

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>          Key: HADOOP-64
>          URL: http://issues.apache.org/jira/browse/HADOOP-64
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.2.0
>     Reporter: Sameer Paranjpye
>     Assignee: Konstantin Shvachko
>     Priority: Minor
>      Fix For: 0.5.0

>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12426347 ] 
            
Sameer Paranjpye commented on HADOOP-64:
----------------------------------------

Can we map effectively map volumes to devices on Windows? Will 'df' under cygwin produce a comprehensible mapping of paths to devices? Maybe this should be left out of the implementation?

Code for monitoring disk capacity on the datanode will need to be updated to run 'df' on all volumes considered.  Round robin placement needs to account for differences in capacity on the various volumes.

How does this interact with Konstantin's storage id implementation? We will now need to have 1 storage-id across multiple volumes.

Do we need to use the last x-bits of a block to map it to a directory? Maybe we should use a simple round robin scheme here as well. The amount of state is small enough to keep in a hastable, no?

Do we ever need to checkpoint datanodes? Seems like that is a separable discussion. In any case, it seems like the less state we keep in side files the better it is.

We should include a mechanism to make read-only volumes visible on the namenode, as part of the health/status page, so that admins can be alerted in a timely manner.

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12426764 ] 
            
Milind Bhandarkar commented on HADOOP-64:
-----------------------------------------

Thanks for your inputs Yoram, Konstantin, Bryan, Sameer.

Here is my modified proposal:

1. The config parameter dfs.data.dir could have a list of directories separated by commas.
2. Another config parameter (client.buffer.dir) will contain comma-separated list of directories for buffering blocks until they are sent to datanode. DFS client will manage the in-memory map of blocks to these directories.
3. Datanode will maintain a map in memory of blockid's mapped to storage locations.
4. Datanode will choose appropriate location to write a block based on a separate block-to-volume placement strategy. Information about volumes will be made available to this strategy with DF.
5. Datanode will try to report correct available diskspace by appropriately taking into account the space reported by DF on each volume. If the mount point is same for more than one volume, then the available disk space will not be counted twice.
6. Storage-ID will be unique per data node, and will be stored in each of the volumes at top levels.
7. Each volume will further be separated into a shallow directory hierarchy, with maximum of N blocks per directory. This block to directory mapping will also be maintained in a hashtable by datanode. as a directory fills up, new directory will be created as a sibling, upto a maximum of N siblings. Then second level of directories will start. The parameter N can be specified as a config variable "dfs.data.numdir".
8. Only if all the volumes specified in dfs.data.dir are read-only, the datanode will shutdown. Otherwise, it will log the readonly directories, and treat them as if they were never specified in dfs.data.dir list. This behavior is consistent with current state of implementation.

If there are any other issues to think about, please comment.


> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "eric baldeschwieler (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-64?page=all ]

eric baldeschwieler updated HADOOP-64:
--------------------------------------

    Fix Version: 0.4
                     (was: 0.3)

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>          Key: HADOOP-64
>          URL: http://issues.apache.org/jira/browse/HADOOP-64
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.2
>     Reporter: Sameer Paranjpye
>     Assignee: Konstantin Shvachko
>     Priority: Minor
>      Fix For: 0.4

>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-64?page=all ]

Doug Cutting updated HADOOP-64:
-------------------------------

        Status: Resolved  (was: Patch Available)
    Resolution: Fixed

I just committed this.  Thanks, Milind!

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>         Attachments: multiple-volumes.patch
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12426370 ] 
            
Bryan Pendleton commented on HADOOP-64:
---------------------------------------

It sounds like its a different issue, but, I'm still worried about the "treat read-only as dead" option.

I regularly see old drives throw out enough IDE errors to get remounted read-only. Often times, a reboot will bring the drive back online just fine - and bring back the blocks that weren't deleted from it. If that block number was reused in the meantime, we'll have a problem anyway, regardless of whether we treat that block as readable once the initial read-only condition is encountered. Meanwhile, we've possibly missed an opportunity to save a block from early death, since a copy of it is still actually available.

I'll probably open up another issue around this later one, once the current one is closed, just to clarify. In the meantime, is there a realistic issue with block numbers being reused, such that an old block coming online would pervert things? Should a datanode maybe be periodically requesting the CRCs for its blocks, and checking to see if they still match? This generally falls into the space of "a good idea" anyway, since, barring reading a block, there's no way to tell if its disk has gone bad.

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12427026 ] 
            
Konstantin Shvachko commented on HADOOP-64:
-------------------------------------------

This proposition looks good to me.
The only thing that seems excessive is the dynamic data structures for maintaining
blockid-to-directory mapping.
The alternative is to do a static mapping based on blockids and the number of directories.
Suppose that the maximal number of entries per directory is N. We should define a function
      dirName( blockId, N, dirLevel )
which returns a local directory name for each level of the directory tree.
So the datanode needs to store  only the current hight  of the directory tree H.
Then for a given  blockId, its path is determined by
      / dirName(blockId,N,0) / dirName(blockId,N,1) / ... / dirName(blockId,N,H)
And when the datanode needs to add a new directory level it will not need
to rename anything in the existing directory tree.
I see a disadvantage of this approach, that the directories should be
re-structured if the maximal number of entries per directory is changed.
But the same is applicable for the dynamic approach, at least when N is decreased.
We might consider hardcoding N rather than having it configurable.


> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "Milind Bhandarkar (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-64?page=all ]

Milind Bhandarkar updated HADOOP-64:
------------------------------------

    Attachment: multiple-volumes.patch

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>         Attachments: multiple-volumes.patch
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12426343 ] 
            
Bryan Pendleton commented on HADOOP-64:
---------------------------------------

Why do datanodes need to checkpoint? What's the value of storing out the mapping, vs. re-enumerating them at startup time? The namenode doesn't keep track of what nodes have which blocks, why should a storage node keep track any more rigorously within its own state? I'd argue that all of that complexity is needless - the cost of maintaining a consistent state is way too high for little benefit.

Please make it very easy to change the block-allocation code. The default behaviors of the current code have been causing troubles on my very heterogenous cluster for a very long time - uniform distribution only really actually makes sense if the same amount of space is available on each drive. For all other cases, doing this leads immediately to unnecessary failures.

I'm not sure about the "blocks considered lost on read-only volumes" bit, but, if that implies that the blocks become unavailable, then I think the approach is too heavy-handed. Those blocks might be the only copies, and ignoring them means that cluster might not be able to find a live copy of a block anywhere else. Please clarify what a "lost" block is.

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs its disks JBOD this means running a Datanode per disk on the machine. While the scheme works reasonably well on small clusters, on larger installations (several 100 nodes) it implies a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira