You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@hadoop.apache.org by Owen O'Malley <om...@apache.org> on 2009/12/02 08:39:52 UTC

Re: Introducing Cloud MapReduce

On Nov 27, 2009, at 9:41 AM, Bruce Snyder wrote:

> 1) It is faster than other implementations (e.g., 60 times faster than
> Hadoop in one case. Speedup depends on the application and data.).

Based on your report, it looks like your comparison was done using  
1.2GB of data in 92,000 files. If you took the default configuration,  
you'll end up with 92,000 maps, each of them processing a very small  
input. You either need to use MultiFileInputFormat or use Hadoop  
Archives to bundle your tiny files into reasonable size. For input  
data that small you should not have more than 8 maps unless each map  
is doing a lot of CPU work.

In your reduce push iterator model, you give up a lot of determinism  
in your application. Many applications that run in Hadoop depend on  
the reduces getting keys in sorted order. Getting keys in a random  
order won't work. Furthermore, I don't see any way that you can scale  
up your approach. You effectively need to hold open all of the reduce  
states for the keys you've seen and can't close them until it is done.  
Again, that will quickly exhaust memory. So the push model will work  
for something like word count, but fail on large problems.

When you add types in, you should add them in as:

K1, V1 -> map -> K2,V2 -> reduce -> K3,V3

with the combiner taking:

K2,V2 -> combiner -> K2,V2

Of course the types are often not primitives and using something like  
Avro is a good thing.

-- Owen

Re: Introducing Cloud MapReduce

Posted by goutham patnaik <go...@gmail.com>.

see inline

On Thu, Dec 3, 2009 at 9:59 AM, <hu...@accenture.com> wrote:

> Owen,
>
> Great questions. You are hitting some key points, answers are inline.
>
> > > 1) It is faster than other implementations (e.g., 60 times faster
> > than
> > > Hadoop in one case. Speedup depends on the application and data.).
> >
> > Based on your report, it looks like your comparison was done using
> > 1.2GB of data in 92,000 files. If you took the default configuration,
> > you'll end up with 92,000 maps, each of them processing a very small
> > input. You either need to use MultiFileInputFormat or use Hadoop
> > Archives to bundle your tiny files into reasonable size. For input
> > data that small you should not have more than 8 maps unless each map
> > is doing a lot of CPU work.
>
> We used default because we were not sure how to optimize it. We can
> certainly rerun the test with your suggestion. The point is not about the
> 60x speedup, it is about a potential bottleneck in the master/slave
> architecture. When you scale up the number of slaves nodes and the number of
> tasks, you will run into the same problem even if you use larger chunk size.
> Due to the lack of access to a large cluster, we are not able to run an
> experiment to show that the master node will choke at some point. This is
> essentially a scaled-down version of the same large-scale test.
>
> > In your reduce push iterator model, you give up a lot of determinism
> > in your application. Many applications that run in Hadoop depend on
> > the reduces getting keys in sorted order. Getting keys in a random
> > order won't work.
>
> Sorting at the end after the values have been reduced is much cheaper than
> sorting the whole set of intermediate key-value pairs. We will add a
> function in our implementation to sort at the end, for those applications
> requiring ordered output.
>

Owen is right though - there are some applications that require the keys to
be sorted when they reach a reducer - secondary sorting is one such example

>
> > Furthermore, I don't see any way that you can scale
> > up your approach. You effectively need to hold open all of the reduce
> > states for the keys you've seen and can't close them until it is done.
> > Again, that will quickly exhaust memory. So the push model will work
> > for something like word count, but fail on large problems.
>
> Our strategy is to have a large number of partitions to limit the number of
> keys in each partition. Having a large number of partitions would not
> overload our system, so it is easy for us. As I expressed in a related
> thread answering Todd's question, I believe the state each key holds is
> small for most applications. Loved to be proved wrong, so that we have real
> motivations to refine the architecture to handle that case.
>
> > When you add types in, you should add them in as:
> >
> > K1, V1 -> map -> K2,V2 -> reduce -> K3,V3
> >
> > with the combiner taking:
> >
> > K2,V2 -> combiner -> K2,V2
> >
> > Of course the types are often not primitives and using something like
> > Avro is a good thing.
>
> Thanks for clarifying. Adding types is in the roadmap now.
>
> Regards.
>
> -Huan
>
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise private information.  If you have
> received it in error, please notify the sender immediately and delete the
> original.  Any other use of the email by you is prohibited.
>

RE: Introducing Cloud MapReduce

Posted by hu...@accenture.com.

Owen, 

Great questions. You are hitting some key points, answers are inline.

> > 1) It is faster than other implementations (e.g., 60 times faster
> than
> > Hadoop in one case. Speedup depends on the application and data.).
> 
> Based on your report, it looks like your comparison was done using
> 1.2GB of data in 92,000 files. If you took the default configuration,
> you'll end up with 92,000 maps, each of them processing a very small
> input. You either need to use MultiFileInputFormat or use Hadoop
> Archives to bundle your tiny files into reasonable size. For input
> data that small you should not have more than 8 maps unless each map
> is doing a lot of CPU work.

We used default because we were not sure how to optimize it. We can certainly rerun the test with your suggestion. The point is not about the 60x speedup, it is about a potential bottleneck in the master/slave architecture. When you scale up the number of slaves nodes and the number of tasks, you will run into the same problem even if you use larger chunk size. Due to the lack of access to a large cluster, we are not able to run an experiment to show that the master node will choke at some point. This is essentially a scaled-down version of the same large-scale test.

> In your reduce push iterator model, you give up a lot of determinism
> in your application. Many applications that run in Hadoop depend on
> the reduces getting keys in sorted order. Getting keys in a random
> order won't work. 

Sorting at the end after the values have been reduced is much cheaper than sorting the whole set of intermediate key-value pairs. We will add a function in our implementation to sort at the end, for those applications requiring ordered output. 

> Furthermore, I don't see any way that you can scale
> up your approach. You effectively need to hold open all of the reduce
> states for the keys you've seen and can't close them until it is done.
> Again, that will quickly exhaust memory. So the push model will work
> for something like word count, but fail on large problems.

Our strategy is to have a large number of partitions to limit the number of keys in each partition. Having a large number of partitions would not overload our system, so it is easy for us. As I expressed in a related thread answering Todd's question, I believe the state each key holds is small for most applications. Loved to be proved wrong, so that we have real motivations to refine the architecture to handle that case.

> When you add types in, you should add them in as:
> 
> K1, V1 -> map -> K2,V2 -> reduce -> K3,V3
> 
> with the combiner taking:
> 
> K2,V2 -> combiner -> K2,V2
> 
> Of course the types are often not primitives and using something like
> Avro is a good thing.

Thanks for clarifying. Adding types is in the roadmap now. 

Regards.

-Huan


This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information.  If you have received it in error, please notify the sender immediately and delete the original.  Any other use of the email by you is prohibited.