You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by "W.P. McNeill" <bi...@gmail.com> on 2011/09/30 01:15:01 UTC

How do I diagnose IO bounded errors using the framework counters?

I have a problem where certain Hadoop jobs take prohibitively long to run.
My hypothesis is that I am generating more I/O than my cluster can handle
and I need to substantiate this. I am looking closely at the Map Reduce
framework counters because I think they contain the information I need, but
I don't understand what the various File System Counters are telling me. Is
there a pointer to an list of exactly what all these counters mean? (So far
my online research has only turned up other people asking the same
question.)

In particular, I suspect that my mapper job--which may write multiple <key,
value> pairs for each one it receives--is writing too many and the values
are too large, but I'm not sure how to test this quantitatively.

Specific questions:

   1. I assume "Map input records" is the total of all <key, value> pairs
   coming into the mappers and "Map output records" is the total of all <key,
   value> pairs written by the mapper. Is this correct?
   2. What is "Map output bytes"? Is this the total number of bytes in all
   the <key, value> pairs written by the mapper?
   3. How would I calculate a corresponding "Map input bytes"? Why doesn't
   that counter exist?
   4. What is the relationship between the FILE|HDFS_BYTES_READ|WRITTEN
   counters? What exactly do they mean, and how do they relate to the "Map
   output bytes" counter?
   5. Sometimes the FILE bytes read and written values are an order of
   magnitude larger than the corresponding HDFS values, and sometimes it's the
   other way around. How do I go about interpreting this?

Re: How do I diagnose IO bounded errors using the framework counters?

Posted by John Meagher <jo...@gmail.com>.

The counter names are created dynamically in mapred.Task

  /**
   * Counters to measure the usage of the different file systems.
   * Always return the String array with two elements. First one is
the name of
   * BYTES_READ counter and second one is of the BYTES_WRITTEN counter.
   */
  protected static String[] getFileSystemCounterNames(String uriScheme) {
    String scheme = uriScheme.toUpperCase();
    return new String[]{scheme+"_BYTES_READ", scheme+"_BYTES_WRITTEN"};
  }


On Tue, Oct 4, 2011 at 17:22, W.P. McNeill <bi...@gmail.com> wrote:
> Here's an even more basic question. I tried to figure out what
> the FILE_BYTES_READ means by searching every file in the hadoop 0.20.203.0
> installation for the string FILE_BYTES_READ installation by running
>
>      find . -type f | xargs grep FILE_BYTES_READ
>
> I only found this string in source files in vaidya contributor directory and
> the tools/rumen directories. Nothing in the main source base.
>
> Where in the source code are these counters created and updated?
>

Re: How do I diagnose IO bounded errors using the framework counters?

Posted by "W.P. McNeill" <bi...@gmail.com>.

Here's an even more basic question. I tried to figure out what
the FILE_BYTES_READ means by searching every file in the hadoop 0.20.203.0
installation for the string FILE_BYTES_READ installation by running

      find . -type f | xargs grep FILE_BYTES_READ

I only found this string in source files in vaidya contributor directory and
the tools/rumen directories. Nothing in the main source base.

Where in the source code are these counters created and updated?

Re: How do I diagnose IO bounded errors using the framework counters?

Posted by "W.P. McNeill" <bi...@gmail.com>.

This is definitely a map-increase job.

I could try a combiner, but I don't think that would help. My keys are small
compared to my values, and values must be kept separate when they are
accumulated in the reducer--they can't be combined into some smaller form,
i.e. they are more like bitmaps than word counts. So the only I/O a combiner
would save for me is in the duplication of (relatively small) keys plus
Hadoop's overhead for a <key, value> pair, which is going to be swamped by
the values themselves.

On Thu, Sep 29, 2011 at 4:29 PM, Lance Norskog <go...@gmail.com> wrote:

> When in doubt, go straight to the owner of a fact. The operating system is
> what really knows disk i/o.
> "my mapper job--which may write multiple <key,value> pairs for each one it
> receives--is writing too many" - ah, a map-increase job :) This is what
> Combiners are for- to keep explosions of data from hitting the network by
> combining in the mapper machine.
>
> On Thu, Sep 29, 2011 at 4:15 PM, W.P. McNeill <bi...@gmail.com> wrote:
>
> > I have a problem where certain Hadoop jobs take prohibitively long to
> run.
> > My hypothesis is that I am generating more I/O than my cluster can handle
> > and I need to substantiate this. I am looking closely at the Map Reduce
> > framework counters because I think they contain the information I need,
> but
> > I don't understand what the various File System Counters are telling me.
> Is
> > there a pointer to an list of exactly what all these counters mean? (So
> far
> > my online research has only turned up other people asking the same
> > question.)
> >
> > In particular, I suspect that my mapper job--which may write multiple
> <key,
> > value> pairs for each one it receives--is writing too many and the values
> > are too large, but I'm not sure how to test this quantitatively.
> >
> > Specific questions:
> >
> >   1. I assume "Map input records" is the total of all <key, value> pairs
> >   coming into the mappers and "Map output records" is the total of all
> > <key,
> >   value> pairs written by the mapper. Is this correct?
> >   2. What is "Map output bytes"? Is this the total number of bytes in all
> >   the <key, value> pairs written by the mapper?
> >   3. How would I calculate a corresponding "Map input bytes"? Why doesn't
> >   that counter exist?
> >   4. What is the relationship between the FILE|HDFS_BYTES_READ|WRITTEN
> >   counters? What exactly do they mean, and how do they relate to the "Map
> >   output bytes" counter?
> >   5. Sometimes the FILE bytes read and written values are an order of
> >   magnitude larger than the corresponding HDFS values, and sometimes it's
> > the
> >   other way around. How do I go about interpreting this?
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Re: How do I diagnose IO bounded errors using the framework counters?

Posted by Lance Norskog <go...@gmail.com>.

When in doubt, go straight to the owner of a fact. The operating system is
what really knows disk i/o.
"my mapper job--which may write multiple <key,value> pairs for each one it
receives--is writing too many" - ah, a map-increase job :) This is what
Combiners are for- to keep explosions of data from hitting the network by
combining in the mapper machine.

On Thu, Sep 29, 2011 at 4:15 PM, W.P. McNeill <bi...@gmail.com> wrote:

> I have a problem where certain Hadoop jobs take prohibitively long to run.
> My hypothesis is that I am generating more I/O than my cluster can handle
> and I need to substantiate this. I am looking closely at the Map Reduce
> framework counters because I think they contain the information I need, but
> I don't understand what the various File System Counters are telling me. Is
> there a pointer to an list of exactly what all these counters mean? (So far
> my online research has only turned up other people asking the same
> question.)
>
> In particular, I suspect that my mapper job--which may write multiple <key,
> value> pairs for each one it receives--is writing too many and the values
> are too large, but I'm not sure how to test this quantitatively.
>
> Specific questions:
>
>   1. I assume "Map input records" is the total of all <key, value> pairs
>   coming into the mappers and "Map output records" is the total of all
> <key,
>   value> pairs written by the mapper. Is this correct?
>   2. What is "Map output bytes"? Is this the total number of bytes in all
>   the <key, value> pairs written by the mapper?
>   3. How would I calculate a corresponding "Map input bytes"? Why doesn't
>   that counter exist?
>   4. What is the relationship between the FILE|HDFS_BYTES_READ|WRITTEN
>   counters? What exactly do they mean, and how do they relate to the "Map
>   output bytes" counter?
>   5. Sometimes the FILE bytes read and written values are an order of
>   magnitude larger than the corresponding HDFS values, and sometimes it's
> the
>   other way around. How do I go about interpreting this?
>



-- 
Lance Norskog
goksron@gmail.com