You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Umar Javed <um...@gmail.com> on 2013/11/21 06:20:26 UTC

time taken to fetch input partition by map

Hi,

The metrics provide information for the reduce (i.e. shuffleReaders) tasks
about the time taken to fetch the shuffle outputs. Is there a way I can
find out the the time taken by a map task (ie shuffleWriter) on a remote
machine to read its input partition from disk?

I believe I should look in HadoopRDD.scala where there is the
getRecordReader, and the headers show that it should be
in org.apache.hadoop.mapred.RecordReader, but I can't find that file
anywhere.

Any help would be appreciated.

thanks!
Umar

Re: time taken to fetch input partition by map

Posted by Evan Chan <ev...@ooyala.com>.

Hi Umar,

It's fine to look into hooking into HadoopRDD, though I think we need a
general purpose way to provide metrics and progress for non-Hadoop RDDs (ie
RDDs that aren't based on an InputFormat).   Any ideas would be great.  :)

-Evan



On Wed, Nov 20, 2013 at 9:20 PM, Umar Javed <um...@gmail.com> wrote:

> Hi,
>
> The metrics provide information for the reduce (i.e. shuffleReaders) tasks
> about the time taken to fetch the shuffle outputs. Is there a way I can
> find out the the time taken by a map task (ie shuffleWriter) on a remote
> machine to read its input partition from disk?
>
> I believe I should look in HadoopRDD.scala where there is the
> getRecordReader, and the headers show that it should be
> in org.apache.hadoop.mapred.RecordReader, but I can't find that file
> anywhere.
>
> Any help would be appreciated.
>
> thanks!
> Umar
>



-- 
--
Evan Chan
Staff Engineer
ev@ooyala.com  |

<http://www.ooyala.com/>
<http://www.facebook.com/ooyala><http://www.linkedin.com/company/ooyala><http://www.twitter.com/ooyala>