You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Jason Venner <ja...@attributor.com> on 2008/02/12 20:51:08 UTC

Question on DFS block placement and 'what is a rack' wrt DFS block placement

We are starting to build larger clusters, and want to better understand 
how to configure the network topology.
Up to now we have just been setting up a private vlan for the small 
clusters.

We have been thinking about the following machine configurations
Compute nodes with a number of spindles and medium disk, that also serve DFS
For every 4-8 of the above, one compute node with a large number of 
spindles with a large number of disks, to bulk out th DFS capacity.

We are wondering what the best practices are for network topology in 
clusters that are built out of the above building blocks.
We can readily have 2 or 4 network cards in each node.
-- 
Jason Venner
Attributor - Publish with Confidence <http://www.attributor.com/>
Attributor is hiring Hadoop Wranglers, contact if interested

Re: Question on DFS block placement and 'what is a rack' wrt DFS block placement

Posted by Doug Cutting <cu...@apache.org>.

Jason Venner wrote:
> Is disk arm contention (seek) a problem in a 6 disk configuration as 
> most likely all of the disks would be serving /local/ and /dfs/?

It should not be.  MapReduce i/o is is sequential, in chunks large 
enough that seeks should not dominate.

Doug

Re: Question on DFS block placement and 'what is a rack' wrt DFS block placement

Posted by Ted Dunning <td...@veoh.com>.

Hadoop is pretty good at doing long sequential reads and writes.  I would
guess that would allow the I/O scheduler plenty of opportunity to optimize
operations.

On 2/12/08 1:11 PM, "Jason Venner" <ja...@attributor.com> wrote:

> Is disk arm contention (seek) a problem in a 6 disk configuration as
> most likely all of the disks would be serving /local/ and /dfs/?

Re: Question on DFS block placement and 'what is a rack' wrt DFS block placement

Posted by Jason Venner <ja...@attributor.com>.

Is disk arm contention (seek) a problem in a 6 disk configuration as 
most likely all of the disks would be serving /local/ and /dfs/?

Doug Cutting wrote:
> Jason Venner wrote:
>> We have 3 types of machines we can get, 2  disk, 6 disk  and 16 disk 
>> machines. They all have 4 dual core cpus.
>>
>> The 2 disk machines have about 1 TB, the 6 disks about 3TB and the 16 
>> disk about 8TB. The 16 disk machines have about 25% slower CPU's than 
>> the 2/6 disk machines.
>>
>> We handle a lot of bulky data, and don't think we can fit it all o 
>> the 3TB machines if those are our sole compute/dfs nodes.
>
> Your performance will be better if you buy enough of the 6 disk nodes 
> to hold all your data than if you intermix 16 disk nodes.  Are the 16 
> disk nodes considerably cheaper per byte stored than the 6 disk boxes?
>
>> From my reading, I conjecture that an ideal configuration would be 1 
>> local disk per cpu for local data/reducing, and some number of 
>> separate disks for dfs.
>> Is this an accurate assessment?
>
> DFS storage is typically local on compute nodes.
>
> Doug

-- 
Jason Venner
Attributor - Publish with Confidence <http://www.attributor.com/>
Attributor is hiring Hadoop Wranglers, contact if interested

Re: Question on DFS block placement and 'what is a rack' wrt DFS block placement

Posted by Ted Dunning <td...@veoh.com>.

I would concur that it is much better to have sufficient storage in the
compute farm for DFS files to be local for the compute tasks.

Also, a 16 disk machine typically costs a good bit more than a 6 disk
machine + 10 disks because you usually require a second chassis.  Sun's
Thumper would be an interesting counter-example of this.

I have found (in my limited experience) that you want as many disk
controllers as you can get and that you want the disk as close to your
compute power as possible.  For me, that means that my ideal machine is a
moderate CPU or two attached to 1-3 TB of storage.  My smallest machines
have slow CPU with two SATA drives (could be 2 x 500GB, but mostly are 500GB
+ 73GB for historical reasons).  These machines can be had for <$500
second-hand and <$1000 new from reputable vendors.  My larger machines have
6 disks and dual Xeons, but cost about $3-4K and only have about twice the
net Hadoop throughput and take up twice the rack space.  I would *much*
rather have 6 times as many of the little boxes.


On 2/12/08 1:01 PM, "Doug Cutting" <cu...@apache.org> wrote:

>> From my reading, I conjecture that an ideal configuration would be 1
>> local disk per cpu for local data/reducing, and some number of separate
>> disks for dfs.
>> Is this an accurate assessment?
> 
> DFS storage is typically local on compute nodes.

Re: Question on DFS block placement and 'what is a rack' wrt DFS block placement

Posted by Doug Cutting <cu...@apache.org>.

Jason Venner wrote:
> We have 3 types of machines we can get, 2  disk, 6 disk  and 16 disk 
> machines. They all have 4 dual core cpus.
> 
> The 2 disk machines have about 1 TB, the 6 disks about 3TB and the 16 
> disk about 8TB. The 16 disk machines have about 25% slower CPU's than 
> the 2/6 disk machines.
> 
> We handle a lot of bulky data, and don't think we can fit it all o the 
> 3TB machines if those are our sole compute/dfs nodes.

Your performance will be better if you buy enough of the 6 disk nodes to 
hold all your data than if you intermix 16 disk nodes.  Are the 16 disk 
nodes considerably cheaper per byte stored than the 6 disk boxes?

> From my reading, I conjecture that an ideal configuration would be 1 
> local disk per cpu for local data/reducing, and some number of separate 
> disks for dfs.
> Is this an accurate assessment?

DFS storage is typically local on compute nodes.

Doug

Re: Question on DFS block placement and 'what is a rack' wrt DFS block placement

Posted by Ted Dunning <td...@veoh.com>.

Why not down-grade the CPU power and increase the number of chassis to get
more disks (and controllers and network interfaces)?

On 2/12/08 12:53 PM, "Jason Venner" <ja...@attributor.com> wrote:

> We have 3 types of machines we can get, 2  disk, 6 disk  and 16 disk
> machines. *They all have 4 dual core cpus.*

Re: Question on DFS block placement and 'what is a rack' wrt DFS block placement

Posted by Jason Venner <ja...@attributor.com>.

We have 3 types of machines we can get, 2  disk, 6 disk  and 16 disk 
machines. They all have 4 dual core cpus.

The 2 disk machines have about 1 TB, the 6 disks about 3TB and the 16 
disk about 8TB. The 16 disk machines have about 25% slower CPU's than 
the 2/6 disk machines.

We handle a lot of bulky data, and don't think we can fit it all o the 
3TB machines if those are our sole compute/dfs nodes.

 From my reading, I conjecture that an ideal configuration would be 1 
local disk per cpu for local data/reducing, and some number of separate 
disks for dfs.
Is this an accurate assessment?

Doug Cutting wrote:
> If you're building a cluster from scratch, why not put a medium number 
> of disk on all nodes, rather than some with more and some with less? 
> That's the optimal configuration for Hadoop, since it best distributes 
> data among computing nodes.
>
> Doug
>
> Jason Venner wrote:
>> We are starting to build larger clusters, and want to better 
>> understand how to configure the network topology.
>> Up to now we have just been setting up a private vlan for the small 
>> clusters.
>>
>> We have been thinking about the following machine configurations
>> Compute nodes with a number of spindles and medium disk, that also 
>> serve DFS
>> For every 4-8 of the above, one compute node with a large number of 
>> spindles with a large number of disks, to bulk out th DFS capacity.
>>
>> We are wondering what the best practices are for network topology in 
>> clusters that are built out of the above building blocks.
>> We can readily have 2 or 4 network cards in each node.
>

Re: Question on DFS block placement and 'what is a rack' wrt DFS block placement

Posted by Doug Cutting <cu...@apache.org>.

If you're building a cluster from scratch, why not put a medium number 
of disk on all nodes, rather than some with more and some with less? 
That's the optimal configuration for Hadoop, since it best distributes 
data among computing nodes.

Doug

Jason Venner wrote:
> We are starting to build larger clusters, and want to better understand 
> how to configure the network topology.
> Up to now we have just been setting up a private vlan for the small 
> clusters.
> 
> We have been thinking about the following machine configurations
> Compute nodes with a number of spindles and medium disk, that also serve 
> DFS
> For every 4-8 of the above, one compute node with a large number of 
> spindles with a large number of disks, to bulk out th DFS capacity.
> 
> We are wondering what the best practices are for network topology in 
> clusters that are built out of the above building blocks.
> We can readily have 2 or 4 network cards in each node.

Re: Question on DFS block placement and 'what is a rack' wrt DFS block placement

Posted by Jason Venner <ja...@attributor.com>.

We run the intel cpu's. The newer NorthBridges seem to have more memory 
bandwidth than the older ones.
We have a mix of special purpose assembly we call from java, so we are 
locked into intel right now.
The performance benchmarks I have seen suggest java runs about 30% 
faster on the AMD's due to the higher memory bandwidth.

Colin Evans wrote:
> Because of acquiring servers of different capacities at different 
> times, we have 2 servers with 1TB of disk each, and 11 servers with 
> ~300GB each.  The 1TB servers tend to be under-utilized by HDFS given 
> their capacity.  This makes sense, as block replicas need to be 
> relatively evenly distributed across the cluster in order to allow 
> tasks to be run close to data.  For out next cluster, we're going with 
> uniform disk, CPU, and memory configurations.
> The big question for me is how well a dual-CPU 4-core (8 cores per 
> box) configuration will do.  Has anyone tried out this configuration 
> with Intel or AMD CPUs?  Is the memory throughput sufficient?
>
>
> Jason Venner wrote:
>> We are starting to build larger clusters, and want to better 
>> understand how to configure the network topology.
>> Up to now we have just been setting up a private vlan for the small 
>> clusters.
>>
>> We have been thinking about the following machine configurations
>> Compute nodes with a number of spindles and medium disk, that also 
>> serve DFS
>> For every 4-8 of the above, one compute node with a large number of 
>> spindles with a large number of disks, to bulk out th DFS capacity.
>>
>> We are wondering what the best practices are for network topology in 
>> clusters that are built out of the above building blocks.
>> We can readily have 2 or 4 network cards in each node.
>

-- 
Jason Venner
Attributor - Publish with Confidence <http://www.attributor.com/>
Attributor is hiring Hadoop Wranglers, contact if interested

Re: Question on DFS block placement and 'what is a rack' wrt DFS block placement

Posted by Ted Dunning <td...@veoh.com>.

It isn't popular much anymore, but once upon a time, network topology for
clustering was a big topic.  Since then, switches have gotten pretty fast
and worrying about these things has gone out of fashion a bit other than
something on the level of the current rack-aware locality in Hadoop.

With 4 NIC's, you could replay some history, however, by building what
amounts to a 4-dimensional hyper-torus.  I *think* you could pretty well by
having four parallel two-level switch networks and assign boxes to second
level switches according to a systematic pattern so that you would have
local access to much of the cluster.  As a simple example, suppose that you
have 16 machines M1 through M16, each with two interfaces.  You would have
two top level switches T1 and T2 and each of those would have four second
level switches S1.1 ... S1.4 connected to T1 and S2.1 ... S2.4 connected to
T2.  The machine connectivity on each interface would look like this:

Machine        eth0               eth1
 1                   S1.1               S2.1
 2                   S1.1               S2.2
 3                   S1.1               S2.3
 4                   S1.1               S2.4
 5                   S1.2               S2.1
 6                   S1.2               S2.2
 7                   S1.2               S2.3
 8                   S1.2               S2.4
 9                   S1.3               S2.1
10                  S1.3                S2.2
11                  S1.3                S2.3
12                  S1.3                S2.4
13                  S1.4                S2.1
14                  S1.4                S2.2
15                  S1.4                S2.3
16                  S1.4                S2.4

With this arrangement machine M1 is on a local switch with M2, M3, M4, M5,
M9, and M13 which is twice as many machines as would be local if only one
interface were used.  With four interfaces and four machines on a local
switch, the entire cluster is local and T1 and T2 should see no traffic.  In
a larger cluster, you get the same 16x benefit in locality.

The cost is that your ops guys will kill you if you suggest something this
elaborate.  The wiring between racks alone will make this a nightmare.

On 2/12/08 3:01 PM, "Jason Venner" <ja...@attributor.com> wrote:

> In terms of more exotic situations we were discussing having 4 NIC's 1
> for the local subset, 1 for a pair of local subsets, 1 for another pair
> of local subsets, 1 for the backbone.

Re: Question on DFS block placement and 'what is a rack' wrt DFS block placement

Posted by Jason Venner <ja...@attributor.com>.

Okay back to network topology then.

How does hadoop determine a 'rack' of machines?

Currently we have everything on a single VLAN, with the DFS master being 
the gateway back to our main VLAN.

We were hoping to group subsets of machines on a local switch, and 
optionally have each machine have a connection to the vlan which is the 
backbone of the entire cluster.

In terms of more exotic situations we were discussing having 4 NIC's 1 
for the local subset, 1 for a pair of local subsets, 1 for another pair 
of local subsets, 1 for the backbone.

Our job mix is totally varied from IO bound to CPU bound
-- 
Jason Venner
Attributor - Publish with Confidence <http://www.attributor.com/>
Attributor is hiring Hadoop Wranglers, contact if interested

Re: Question on DFS block placement and 'what is a rack' wrt DFS block placement

Posted by Ted Dunning <td...@veoh.com>.

Fewer boxes may or may not be cheaper, else we would all be using massive
Suns.

The issue is what the right balance between disk I/O and CPU power is.  My
job mix tends to not benefit all that much from another CPU core in an
existing box, but does benefit substantially from having another spindle on
an independent controller and network interface.

On 2/12/08 1:57 PM, "Colin Evans" <co...@metaweb.com> wrote:

> When you factor in colo, power, configuration, and administration costs,
> fewer boxes is always cheaper.  I'm not expecting a 2x speedup for the
> extra CPU, but I'm curious what the hit is.
> 
> 
> 
> Ted Dunning wrote:
>> Doesn't the incremental CPU cost you as much as an entire extra box?
>> 
>> 
>> On 2/12/08 12:19 PM, "Colin Evans" <co...@metaweb.com> wrote:
>> 
>>   
>>> The big question for me is how well a dual-CPU 4-core (8 cores per box)
>>> configuration will do.  Has anyone tried out this configuration with
>>> Intel or AMD CPUs?  Is the memory throughput sufficient?
>>>     
>> 
>>   
>

Re: Question on DFS block placement and 'what is a rack' wrt DFS block placement

Posted by Colin Evans <co...@metaweb.com>.

When you factor in colo, power, configuration, and administration costs, 
fewer boxes is always cheaper.  I'm not expecting a 2x speedup for the 
extra CPU, but I'm curious what the hit is.

Ted Dunning wrote:
> Doesn't the incremental CPU cost you as much as an entire extra box?
>
>
> On 2/12/08 12:19 PM, "Colin Evans" <co...@metaweb.com> wrote:
>
>   
>> The big question for me is how well a dual-CPU 4-core (8 cores per box)
>> configuration will do.  Has anyone tried out this configuration with
>> Intel or AMD CPUs?  Is the memory throughput sufficient?
>>     
>
>

Re: Question on DFS block placement and 'what is a rack' wrt DFS block placement

Posted by Ted Dunning <td...@veoh.com>.

Doesn't the incremental CPU cost you as much as an entire extra box?

On 2/12/08 12:19 PM, "Colin Evans" <co...@metaweb.com> wrote:

> The big question for me is how well a dual-CPU 4-core (8 cores per box)
> configuration will do.  Has anyone tried out this configuration with
> Intel or AMD CPUs?  Is the memory throughput sufficient?

Re: Question on DFS block placement and 'what is a rack' wrt DFS block placement

Posted by Colin Evans <co...@metaweb.com>.

Because of acquiring servers of different capacities at different times, 
we have 2 servers with 1TB of disk each, and 11 servers with ~300GB 
each.  The 1TB servers tend to be under-utilized by HDFS given their 
capacity.  This makes sense, as block replicas need to be relatively 
evenly distributed across the cluster in order to allow tasks to be run 
close to data.  For out next cluster, we're going with uniform disk, 
CPU, and memory configurations. 

The big question for me is how well a dual-CPU 4-core (8 cores per box) 
configuration will do.  Has anyone tried out this configuration with 
Intel or AMD CPUs?  Is the memory throughput sufficient?

Jason Venner wrote:
> We are starting to build larger clusters, and want to better 
> understand how to configure the network topology.
> Up to now we have just been setting up a private vlan for the small 
> clusters.
>
> We have been thinking about the following machine configurations
> Compute nodes with a number of spindles and medium disk, that also 
> serve DFS
> For every 4-8 of the above, one compute node with a large number of 
> spindles with a large number of disks, to bulk out th DFS capacity.
>
> We are wondering what the best practices are for network topology in 
> clusters that are built out of the above building blocks.
> We can readily have 2 or 4 network cards in each node.

RE: Question on DFS block placement and 'what is a rack' wrt DFS block placement

Posted by Joydeep Sen Sarma <js...@facebook.com>.

There may still be remaining issues with. One I am aware of is
https://issues.apache.org/jira/browse/HADOOP-2677 where smaller capacity
nodes become too highly utilized to store mapred intermediate output.


-----Original Message-----
From: Jason Venner [mailto:jason@attributor.com] 
Sent: Tuesday, February 12, 2008 12:02 PM
To: core-user@hadoop.apache.org
Subject: Re: Question on DFS block placement and 'what is a rack' wrt
DFS block placement

We are currently running 15.3, and hope to move to 16.1 when it comes
out...
Where the heterogeneous disk space issues fixed in15.3?

Ted Dunning wrote:
> I have had issues with machines that are highly disparate in terms of
disk
> space.  I expect that some of those issues have been mitigated in
recent
> releases.
>
>
> On 2/12/08 11:51 AM, "Jason Venner" <ja...@attributor.com> wrote:
>
>   
>> We are starting to build larger clusters, and want to better
understand
>> how to configure the network topology.
>> Up to now we have just been setting up a private vlan for the small
>> clusters.
>>
>> We have been thinking about the following machine configurations
>> Compute nodes with a number of spindles and medium disk, that also
serve DFS
>> For every 4-8 of the above, one compute node with a large number of
>> spindles with a large number of disks, to bulk out th DFS capacity.
>>
>> We are wondering what the best practices are for network topology in
>> clusters that are built out of the above building blocks.
>> We can readily have 2 or 4 network cards in each node.
>>     
>
>

Re: Question on DFS block placement and 'what is a rack' wrt DFS block placement

Posted by Jason Venner <ja...@attributor.com>.

We are currently running 15.3, and hope to move to 16.1 when it comes out...
Where the heterogeneous disk space issues fixed in15.3?

Ted Dunning wrote:
> I have had issues with machines that are highly disparate in terms of disk
> space.  I expect that some of those issues have been mitigated in recent
> releases.
>
>
> On 2/12/08 11:51 AM, "Jason Venner" <ja...@attributor.com> wrote:
>
>   
>> We are starting to build larger clusters, and want to better understand
>> how to configure the network topology.
>> Up to now we have just been setting up a private vlan for the small
>> clusters.
>>
>> We have been thinking about the following machine configurations
>> Compute nodes with a number of spindles and medium disk, that also serve DFS
>> For every 4-8 of the above, one compute node with a large number of
>> spindles with a large number of disks, to bulk out th DFS capacity.
>>
>> We are wondering what the best practices are for network topology in
>> clusters that are built out of the above building blocks.
>> We can readily have 2 or 4 network cards in each node.
>>     
>
>

Re: Question on DFS block placement and 'what is a rack' wrt DFS block placement

Posted by Ted Dunning <td...@veoh.com>.


I have had issues with machines that are highly disparate in terms of disk
space.  I expect that some of those issues have been mitigated in recent
releases.


On 2/12/08 11:51 AM, "Jason Venner" <ja...@attributor.com> wrote:

> We are starting to build larger clusters, and want to better understand
> how to configure the network topology.
> Up to now we have just been setting up a private vlan for the small
> clusters.
> 
> We have been thinking about the following machine configurations
> Compute nodes with a number of spindles and medium disk, that also serve DFS
> For every 4-8 of the above, one compute node with a large number of
> spindles with a large number of disks, to bulk out th DFS capacity.
> 
> We are wondering what the best practices are for network topology in
> clusters that are built out of the above building blocks.
> We can readily have 2 or 4 network cards in each node.