You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Rob Stewart <ro...@googlemail.com> on 2009/11/07 20:01:54 UTC

Hadoop Job Performance as MapReduce count increases?

Hi, briefly, I'm writing my dissertation on Distributed computation, and
then in detail at the various interfaces atop of Hadoop, including Pig,
Hive, JAQL etc...
One thing I have noticed in early testing is that Pig tends to generate more
Map tasks for a given query, than other interfaces for identical query
design.

So my question to you MapReduce folks is this:
------------
 If there are 100 Map jobs, spread across 10 DataNodes, and one DataNode
fails, then approximately 10 Map jobs will be redistributed over the
remaining 9 DataNodes. If, however, there were 500 Map jobs over the 10
DataNodes, one of them fails, then 50 Map jobs will be reallocated to the
remaining 9 DataNodes. Am I to expect a difference in overal performance in
both of these scenario's?
-----------

The reason for wanting to know this is to perhaps discuss in more detail as
to whether, in a situation where many faults on the cluster occur, an Hadoop
job with many Map/Reduce tasks will handle the unreliability better than an
Hadoop job that has much fewer Map/Reduce tasks?

If this were the case, is it true to state that, if the reliability of an
Hadoop cluster (network reliability, DataNode reliability etc...) were known
before a job was sent to the cluster, the user submitting the job would want
to adjust the number of Map/Reduce tasks dependant on the reliability?


I may be well off course with this idea, and if this is the case, do let me
know!


thanks,


Rob Stewart

Re: Hadoop Job Performance as MapReduce count increases?

Posted by Aaron Kimball <aa...@cloudera.com>.
That makes sense. It's worth pointing out that tasks are scheduled on a
"pull" basis -- tasktrackers ask for more work if they have free slots for
tasks -- so it is not a given that all nodes will receive the same number of
tasks. If some tasks take considerably longer (or some nodes are
faster/slower than the others), then those nodes may request fewer tasks
from the jobtracker as the job runs -- so in practice you may not get
results as even as your table suggests.

- Aaron

On Wed, Nov 11, 2009 at 8:30 AM, Rob Stewart <ro...@googlemail.com>wrote:

>
> Hi Aaron,
>
> your response was very useful indeed, thank you very much.
>
> OK, I've documented the scenario (relevent to my experiments), where the
> cluster is very small, only 10 nodes.
>
> I have uploaded this section only to :
>
> http://linuxsoftwareblog.com/Hadoop/small_cluster_scenario.png
>
> Can I ask, does the paragraph, and the subsequent table make relevant sense
> to you, and does it reflect the true nature of potential performance issues
> with very small MapReduce task count on an unreliable small cluster?
>
>
> thanks,
>
>
> Rob Stewart
>
>
>
>
> 2009/11/9 Aaron Kimball <aa...@cloudera.com>
>
>
>>
>> On Sat, Nov 7, 2009 at 11:01 AM, Rob Stewart <robstewart57@googlemail.com
>> > wrote:
>>
>>> Hi, briefly, I'm writing my dissertation on Distributed computation, and
>>> then in detail at the various interfaces atop of Hadoop, including Pig,
>>> Hive, JAQL etc...
>>> One thing I have noticed in early testing is that Pig tends to generate
>>> more Map tasks for a given query, than other interfaces for identical query
>>> design.
>>>
>>> So my question to you MapReduce folks is this:
>>> ------------
>>>  If there are 100 Map jobs, spread across 10 DataNodes, and one DataNode
>>> fails, then approximately 10 Map jobs will be redistributed over the
>>> remaining 9 DataNodes. If, however, there were 500 Map jobs over the 10
>>> DataNodes, one of them fails, then 50 Map jobs will be reallocated to the
>>> remaining 9 DataNodes. Am I to expect a difference in overal performance in
>>> both of these scenario's?
>>>
>>
>>
>> That depends entirely on how heavy the workloads are in these various
>> jobs. By the way; the term-of-art in Hadoop is "map task" -- a single
>> MapReduce "job" contains a set of map tasks and a set of reduce tasks, each
>> of which may be executed multiple times (e.g., in the event of node
>> failure); such re-executions of a given task are known as task attempts.
>>
>> At any rate, it is likely that on a small number of nodes (e.g., 10) a
>> higher number of more-granular tasks will result in better overall
>> performance. If the 500 tasks were doing the same work as the 10 tasks in
>> the original case, then were a node to fail, 50 tasks would need to be
>> redistributed. This can happen in several ways depending on which nodes have
>> free resources; likely all 9 healthy nodes will share in the work. Whereas
>> if there was a single task on the failed node, then only one other machine
>> could pick that up.
>>
>> On the other hand, there is a cost associated with the setup/teardown of
>> tasks as well as merging their results for reducers; breaking up work into
>> more tasks is good to a point, but going from 10 to 500 is likely to slow
>> down the overall result in the average no-failure case. I think most folks
>> strive for average task size to be anywhere from 128 MB to 2048 MB of
>> uncompressed data. A 50x task granularity improvement would push average
>> task running times well below the threshold of usability.
>>
>> - Aaron
>>
>>
>>
>>> -----------
>>>
>>> The reason for wanting to know this is to perhaps discuss in more detail
>>> as to whether, in a situation where many faults on the cluster occur, an
>>> Hadoop job with many Map/Reduce tasks will handle the unreliability better
>>> than an Hadoop job that has much fewer Map/Reduce tasks?
>>>
>>
>> Depends what you mean by "handle the unreliability." The MapReduce
>> platform will get your job done in either case. There is no inherent
>> resilience or weakness to failure based on number of tasks. As stated above,
>> more granular tasks may result in more even redistribution of additional
>> work.
>>
>>
>>>
>>> If this were the case, is it true to state that, if the reliability of an
>>> Hadoop cluster (network reliability, DataNode reliability etc...) were known
>>> before a job was sent to the cluster, the user submitting the job would want
>>> to adjust the number of Map/Reduce tasks dependant on the reliability?
>>>
>>>
>> Not likely. Other parameters, such as the acceptable number of attempt
>> failures per task, various timeout values, etc, are considerably better
>> tuning parameters given a baseline reliability profile for a cluster.
>>
>> Any given node in a cluster is likely to be less reliable as the size of
>> the cluster grows. e.g., if you have 200 nodes, that's likely to result in
>> more frequent node failures than a cluster of 10 nodes. And job size will
>> grow with cluster size (you don't buy 200 nodes unless you have a lot of
>> work to do). So a job running "at scale" of 50, 100, or 200 nodes will often
>> see anywhere from 500 to 10,000 map tasks anyway. This already represents
>> plenty of opportunity for granular work reassignment; switching a given job
>> from 3,000 to 6,000 map tasks is unlikely to improve the overall running
>> time very much. Since individual tasks in any multi-thousand-task job will
>> likely have a high degree of variance in runtime, the efficiency of the
>> jobtracker of scheduling work won't necessarily be improved by having that
>> many more tasks anyway.
>>
>> - Aaron
>>
>>
>>>
>>> I may be well off course with this idea, and if this is the case, do let
>>> me know!
>>>
>>>
>>> thanks,
>>>
>>>
>>> Rob Stewart
>>>
>>
>>
>

Re: Hadoop Job Performance as MapReduce count increases?

Posted by Rob Stewart <ro...@googlemail.com>.
Hi Aaron,

your response was very useful indeed, thank you very much.

OK, I've documented the scenario (relevent to my experiments), where the
cluster is very small, only 10 nodes.

I have uploaded this section only to :

http://linuxsoftwareblog.com/Hadoop/small_cluster_scenario.png

Can I ask, does the paragraph, and the subsequent table make relevant sense
to you, and does it reflect the true nature of potential performance issues
with very small MapReduce task count on an unreliable small cluster?


thanks,


Rob Stewart




2009/11/9 Aaron Kimball <aa...@cloudera.com>

>
>
> On Sat, Nov 7, 2009 at 11:01 AM, Rob Stewart <ro...@googlemail.com>wrote:
>
>> Hi, briefly, I'm writing my dissertation on Distributed computation, and
>> then in detail at the various interfaces atop of Hadoop, including Pig,
>> Hive, JAQL etc...
>> One thing I have noticed in early testing is that Pig tends to generate
>> more Map tasks for a given query, than other interfaces for identical query
>> design.
>>
>> So my question to you MapReduce folks is this:
>> ------------
>>  If there are 100 Map jobs, spread across 10 DataNodes, and one DataNode
>> fails, then approximately 10 Map jobs will be redistributed over the
>> remaining 9 DataNodes. If, however, there were 500 Map jobs over the 10
>> DataNodes, one of them fails, then 50 Map jobs will be reallocated to the
>> remaining 9 DataNodes. Am I to expect a difference in overal performance in
>> both of these scenario's?
>>
>
>
> That depends entirely on how heavy the workloads are in these various jobs.
> By the way; the term-of-art in Hadoop is "map task" -- a single MapReduce
> "job" contains a set of map tasks and a set of reduce tasks, each of which
> may be executed multiple times (e.g., in the event of node failure); such
> re-executions of a given task are known as task attempts.
>
> At any rate, it is likely that on a small number of nodes (e.g., 10) a
> higher number of more-granular tasks will result in better overall
> performance. If the 500 tasks were doing the same work as the 10 tasks in
> the original case, then were a node to fail, 50 tasks would need to be
> redistributed. This can happen in several ways depending on which nodes have
> free resources; likely all 9 healthy nodes will share in the work. Whereas
> if there was a single task on the failed node, then only one other machine
> could pick that up.
>
> On the other hand, there is a cost associated with the setup/teardown of
> tasks as well as merging their results for reducers; breaking up work into
> more tasks is good to a point, but going from 10 to 500 is likely to slow
> down the overall result in the average no-failure case. I think most folks
> strive for average task size to be anywhere from 128 MB to 2048 MB of
> uncompressed data. A 50x task granularity improvement would push average
> task running times well below the threshold of usability.
>
> - Aaron
>
>
>
>> -----------
>>
>> The reason for wanting to know this is to perhaps discuss in more detail
>> as to whether, in a situation where many faults on the cluster occur, an
>> Hadoop job with many Map/Reduce tasks will handle the unreliability better
>> than an Hadoop job that has much fewer Map/Reduce tasks?
>>
>
> Depends what you mean by "handle the unreliability." The MapReduce platform
> will get your job done in either case. There is no inherent resilience or
> weakness to failure based on number of tasks. As stated above, more granular
> tasks may result in more even redistribution of additional work.
>
>
>>
>> If this were the case, is it true to state that, if the reliability of an
>> Hadoop cluster (network reliability, DataNode reliability etc...) were known
>> before a job was sent to the cluster, the user submitting the job would want
>> to adjust the number of Map/Reduce tasks dependant on the reliability?
>>
>>
> Not likely. Other parameters, such as the acceptable number of attempt
> failures per task, various timeout values, etc, are considerably better
> tuning parameters given a baseline reliability profile for a cluster.
>
> Any given node in a cluster is likely to be less reliable as the size of
> the cluster grows. e.g., if you have 200 nodes, that's likely to result in
> more frequent node failures than a cluster of 10 nodes. And job size will
> grow with cluster size (you don't buy 200 nodes unless you have a lot of
> work to do). So a job running "at scale" of 50, 100, or 200 nodes will often
> see anywhere from 500 to 10,000 map tasks anyway. This already represents
> plenty of opportunity for granular work reassignment; switching a given job
> from 3,000 to 6,000 map tasks is unlikely to improve the overall running
> time very much. Since individual tasks in any multi-thousand-task job will
> likely have a high degree of variance in runtime, the efficiency of the
> jobtracker of scheduling work won't necessarily be improved by having that
> many more tasks anyway.
>
> - Aaron
>
>
>>
>> I may be well off course with this idea, and if this is the case, do let
>> me know!
>>
>>
>> thanks,
>>
>>
>> Rob Stewart
>>
>
>

Re: Hadoop Job Performance as MapReduce count increases?

Posted by Aaron Kimball <aa...@cloudera.com>.
On Sat, Nov 7, 2009 at 11:01 AM, Rob Stewart <ro...@googlemail.com>wrote:

> Hi, briefly, I'm writing my dissertation on Distributed computation, and
> then in detail at the various interfaces atop of Hadoop, including Pig,
> Hive, JAQL etc...
> One thing I have noticed in early testing is that Pig tends to generate
> more Map tasks for a given query, than other interfaces for identical query
> design.
>
> So my question to you MapReduce folks is this:
> ------------
>  If there are 100 Map jobs, spread across 10 DataNodes, and one DataNode
> fails, then approximately 10 Map jobs will be redistributed over the
> remaining 9 DataNodes. If, however, there were 500 Map jobs over the 10
> DataNodes, one of them fails, then 50 Map jobs will be reallocated to the
> remaining 9 DataNodes. Am I to expect a difference in overal performance in
> both of these scenario's?
>


That depends entirely on how heavy the workloads are in these various jobs.
By the way; the term-of-art in Hadoop is "map task" -- a single MapReduce
"job" contains a set of map tasks and a set of reduce tasks, each of which
may be executed multiple times (e.g., in the event of node failure); such
re-executions of a given task are known as task attempts.

At any rate, it is likely that on a small number of nodes (e.g., 10) a
higher number of more-granular tasks will result in better overall
performance. If the 500 tasks were doing the same work as the 10 tasks in
the original case, then were a node to fail, 50 tasks would need to be
redistributed. This can happen in several ways depending on which nodes have
free resources; likely all 9 healthy nodes will share in the work. Whereas
if there was a single task on the failed node, then only one other machine
could pick that up.

On the other hand, there is a cost associated with the setup/teardown of
tasks as well as merging their results for reducers; breaking up work into
more tasks is good to a point, but going from 10 to 500 is likely to slow
down the overall result in the average no-failure case. I think most folks
strive for average task size to be anywhere from 128 MB to 2048 MB of
uncompressed data. A 50x task granularity improvement would push average
task running times well below the threshold of usability.

- Aaron



> -----------
>
> The reason for wanting to know this is to perhaps discuss in more detail as
> to whether, in a situation where many faults on the cluster occur, an Hadoop
> job with many Map/Reduce tasks will handle the unreliability better than an
> Hadoop job that has much fewer Map/Reduce tasks?
>

Depends what you mean by "handle the unreliability." The MapReduce platform
will get your job done in either case. There is no inherent resilience or
weakness to failure based on number of tasks. As stated above, more granular
tasks may result in more even redistribution of additional work.


>
> If this were the case, is it true to state that, if the reliability of an
> Hadoop cluster (network reliability, DataNode reliability etc...) were known
> before a job was sent to the cluster, the user submitting the job would want
> to adjust the number of Map/Reduce tasks dependant on the reliability?
>
>
Not likely. Other parameters, such as the acceptable number of attempt
failures per task, various timeout values, etc, are considerably better
tuning parameters given a baseline reliability profile for a cluster.

Any given node in a cluster is likely to be less reliable as the size of the
cluster grows. e.g., if you have 200 nodes, that's likely to result in more
frequent node failures than a cluster of 10 nodes. And job size will grow
with cluster size (you don't buy 200 nodes unless you have a lot of work to
do). So a job running "at scale" of 50, 100, or 200 nodes will often see
anywhere from 500 to 10,000 map tasks anyway. This already represents plenty
of opportunity for granular work reassignment; switching a given job from
3,000 to 6,000 map tasks is unlikely to improve the overall running time
very much. Since individual tasks in any multi-thousand-task job will likely
have a high degree of variance in runtime, the efficiency of the jobtracker
of scheduling work won't necessarily be improved by having that many more
tasks anyway.

- Aaron


>
> I may be well off course with this idea, and if this is the case, do let me
> know!
>
>
> thanks,
>
>
> Rob Stewart
>