You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by David Poisson <Da...@ca.fujitsu.com> on 2013/07/02 15:59:22 UTC
RE: Profiling map reduce jobs?

Hi,
     I'm using the newer version of the mapreduce API (org.apache.hadoop.mapreduce, not org.apache.hadoop.mapred). I was under the impression that the older API doesn't work well with the newer one. Do you have an example of using the older OutputFormat with the newer API by any chance?

Cheers,

David Poisson

________________________________________
From: Azuryy Yu [azuryyyu@gmail.com]
Sent: Saturday, June 29, 2013 10:34 AM
To: user@hbase.apache.org
Subject: RE: Profiling map reduce jobs?

I just advice to use MultipleOutputFormat, instead of MultipleOurput.write

--Send from my Sony mobile.
On Jun 29, 2013 9:16 PM, "David Poisson" <Da...@ca.fujitsu.com>
wrote:

> Just thought I'd provide some insight into our problem.
>
> It appears that the problem was a slowdown caused by the use of
> multipleOutputs.write(output, key, keyValue, path) (going from memory
> here). Anyways, after looking at the implementation of that write function
>  in multipleOutputs.java it appears that a context was created and a conf
> was gotten and a new recordWriter was gotten for every call to
> write(output, key, keyValue, path).
>
> We have changed all of those calls to write(output, key, keyValue) (which
> doesn't do any extra things) and it seems to help.
>
> Anyone else has any tips when using multipleOutputs?
>
> We are taking our input and splitting it into 3 files. So it seems to be a
> natural choice for MultipleOutputs. Performance is a bit slow though.
>
> Cheers!
>
> David
> ________________________________________
> From: David Poisson [David.Poisson@ca.fujitsu.com]
> Sent: Thursday, June 27, 2013 4:22 PM
> To: user@hbase.apache.org
> Subject: Profiling map reduce jobs?
>
> Howdy,
>      I want to take a look at a MR job which seems to be slower than I had
> hoped. Mind you, this MR job is only running on a pseudo-distributed VM
> (cloudera cdh4).
>
> I have modified my mapred-site.xml with the following (that last one is
> commented out because it crashes my MR job):
>
>   <property>
>     <name>mapred.task.profile</name>
>     <value>true</value>
>   </property>
>   <property>
>     <name>mapred.task.profile.maps</name>
>     <value>0-2</value>
>   </property>
>   <property>
>     <name>mapred.task.profile.reduces</name>
>     <value>0-2</value>
>   </property>
>   <!--property>
>     <name>mapred.task.profile.params</name>
>
> <value>agentlib:hprof=cpu=samples,heap=sites,depth=6,force=n,thread=y,verbose=n,file=%s</value>
>   </property-->
> Are there any resources that explain how to interpret the results?
> Or maybe an open-source app that could help display the results in a more
> intuiative manner?
>
> Ideally, we'd want to know where we are spending most of our time.
>
> Cheers,
>
> David