You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Philippe Signoret <ph...@gmail.com> on 2013/04/05 18:01:16 UTC

MAP_INPUT_BYTES missing from counters

I noticed recently that some Word Count jobs I've run are finishing with
the MAP_INPUT_BYTES counter missing.

I'm using Hadoop 1.1.2 with mostly default configuration with 5 nodes. The
input was a single 100KB text file.

Questions:

   - Is it normal for any final counters values not to be present?
   - Is MAP_INPUT_BYTES the best was to determine total input data size? (I
   do so programmatically, while it's running and after the job is complete.)

The counters I did get:

Job Counters
 TOTAL_LAUNCHED_REDUCES:1
 SLOTS_MILLIS_MAPS: 6006
 FALLOW_SLOTS_MILLIS_REDUCES: 0
 FALLOW_SLOTS_MILLIS_MAPS: 0
 TOTAL_LAUNCHED_MAPS: 1
 DATA_LOCAL_MAPS: 1
 SLOTS_MILLIS_REDUCES: 9293
File Output Format Counters
 BYTES_WRITTEN: 366752
FileSystemCounters
 FILE_BYTES_READ: 505552
 HDFS_BYTES_READ: 1085517
 FILE_BYTES_WRITTEN: 1122685
 HDFS_BYTES_WRITTEN: 366752
File Input Format Counters
 BYTES_READ: 1085357
Map-Reduce Framework
 MAP_OUTPUT_MATERIALIZED_BYTES: 505552
 MAP_INPUT_RECORDS: 19446
 REDUCE_SHUFFLE_BYTES: 505552
 SPILLED_RECORDS: 70358
 MAP_OUTPUT_BYTES: 1750111
 CPU_MILLISECONDS: 5700
 COMMITTED_HEAP_BYTES: 401997824
 COMBINE_INPUT_RECORDS: 181151
 SPLIT_RAW_BYTES: 160
 REDUCE_INPUT_RECORDS: 35179
 REDUCE_INPUT_GROUPS: 35179
 COMBINE_OUTPUT_RECORDS:35179
 PHYSICAL_MEMORY_BYTES: 378482688
 REDUCE_OUTPUT_RECORDS: 35179
 VIRTUAL_MEMORY_BYTES: 1139838976
 MAP_OUTPUT_RECORDS: 181151


Here are most of the relevant screens from the JobTracker web interface:
http://jsfiddle.net/Fguyy/2/embedded/result/

Here is the JobTracker log (relevant time frame):
http://pastebin.com/dvsMn4fB

Thanks!
Philippe

-------------------------------
*Philippe Signoret*
Skype: philippesignoret
+33 6 95 89 55 55

Re: MAP_INPUT_BYTES missing from counters

Posted by Philippe Signoret <ph...@gmail.com>.

Nope, regular simple text file (.txt from Guttenberg).

I'll keep looking into it and try to reproduce consistently.

Thanks!
Philippe
On Apr 6, 2013 1:39 PM, "yypvsxf19870706" <yy...@gmail.com> wrote:

> Hi
>
>      Is your input file compressed or named with the suffix gz ,or like
> that?
>      It is interesting .
>      Map_input_bytes is the number of bytes of uncompressed  input
> consumed by all the maps in the job.incremented every time a record is read
> from a RecordReader and passed to the map's map method by framework
> .[Hadoop Definitive Guide page 226]
>
>    Please inform of us ,if you get anything further.
>
> Regards.
>
>
>
> 发自我的 iPhone
>
> 在 2013-4-6，0:01，Philippe Signoret <ph...@gmail.com> 写道：
>
> I noticed recently that some Word Count jobs I've run are finishing with
> the MAP_INPUT_BYTES counter missing.
>
> I'm using Hadoop 1.1.2 with mostly default configuration with 5 nodes. The
> input was a single 100KB text file.
>
> Questions:
>
>    - Is it normal for any final counters values not to be present?
>    - Is MAP_INPUT_BYTES the best was to determine total input data size?
>    (I do so programmatically, while it's running and after the job is
>    complete.)
>
> The counters I did get:
>
> Job Counters
>  TOTAL_LAUNCHED_REDUCES:1
>  SLOTS_MILLIS_MAPS: 6006
>  FALLOW_SLOTS_MILLIS_REDUCES: 0
>  FALLOW_SLOTS_MILLIS_MAPS: 0
>  TOTAL_LAUNCHED_MAPS: 1
>  DATA_LOCAL_MAPS: 1
>  SLOTS_MILLIS_REDUCES: 9293
> File Output Format Counters
>  BYTES_WRITTEN: 366752
> FileSystemCounters
>  FILE_BYTES_READ: 505552
>  HDFS_BYTES_READ: 1085517
>  FILE_BYTES_WRITTEN: 1122685
>  HDFS_BYTES_WRITTEN: 366752
> File Input Format Counters
>  BYTES_READ: 1085357
> Map-Reduce Framework
>  MAP_OUTPUT_MATERIALIZED_BYTES: 505552
>  MAP_INPUT_RECORDS: 19446
>  REDUCE_SHUFFLE_BYTES: 505552
>  SPILLED_RECORDS: 70358
>  MAP_OUTPUT_BYTES: 1750111
>  CPU_MILLISECONDS: 5700
>  COMMITTED_HEAP_BYTES: 401997824
>  COMBINE_INPUT_RECORDS: 181151
>  SPLIT_RAW_BYTES: 160
>  REDUCE_INPUT_RECORDS: 35179
>  REDUCE_INPUT_GROUPS: 35179
>  COMBINE_OUTPUT_RECORDS:35179
>  PHYSICAL_MEMORY_BYTES: 378482688
>  REDUCE_OUTPUT_RECORDS: 35179
>  VIRTUAL_MEMORY_BYTES: 1139838976
>  MAP_OUTPUT_RECORDS: 181151
>
>
> Here are most of the relevant screens from the JobTracker web interface:
> http://jsfiddle.net/Fguyy/2/embedded/result/
>
> Here is the JobTracker log (relevant time frame):
> http://pastebin.com/dvsMn4fB
>
> Thanks!
> Philippe
>
> -------------------------------
> *Philippe Signoret*
> Skype: philippesignoret
> +33 6 95 89 55 55
>
>

Re: MAP_INPUT_BYTES missing from counters

Posted by Philippe Signoret <ph...@gmail.com>.

Nope, regular simple text file (.txt from Guttenberg).

I'll keep looking into it and try to reproduce consistently.

Thanks!
Philippe
On Apr 6, 2013 1:39 PM, "yypvsxf19870706" <yy...@gmail.com> wrote:

> Hi
>
>      Is your input file compressed or named with the suffix gz ,or like
> that?
>      It is interesting .
>      Map_input_bytes is the number of bytes of uncompressed  input
> consumed by all the maps in the job.incremented every time a record is read
> from a RecordReader and passed to the map's map method by framework
> .[Hadoop Definitive Guide page 226]
>
>    Please inform of us ,if you get anything further.
>
> Regards.
>
>
>
> 发自我的 iPhone
>
> 在 2013-4-6，0:01，Philippe Signoret <ph...@gmail.com> 写道：
>
> I noticed recently that some Word Count jobs I've run are finishing with
> the MAP_INPUT_BYTES counter missing.
>
> I'm using Hadoop 1.1.2 with mostly default configuration with 5 nodes. The
> input was a single 100KB text file.
>
> Questions:
>
>    - Is it normal for any final counters values not to be present?
>    - Is MAP_INPUT_BYTES the best was to determine total input data size?
>    (I do so programmatically, while it's running and after the job is
>    complete.)
>
> The counters I did get:
>
> Job Counters
>  TOTAL_LAUNCHED_REDUCES:1
>  SLOTS_MILLIS_MAPS: 6006
>  FALLOW_SLOTS_MILLIS_REDUCES: 0
>  FALLOW_SLOTS_MILLIS_MAPS: 0
>  TOTAL_LAUNCHED_MAPS: 1
>  DATA_LOCAL_MAPS: 1
>  SLOTS_MILLIS_REDUCES: 9293
> File Output Format Counters
>  BYTES_WRITTEN: 366752
> FileSystemCounters
>  FILE_BYTES_READ: 505552
>  HDFS_BYTES_READ: 1085517
>  FILE_BYTES_WRITTEN: 1122685
>  HDFS_BYTES_WRITTEN: 366752
> File Input Format Counters
>  BYTES_READ: 1085357
> Map-Reduce Framework
>  MAP_OUTPUT_MATERIALIZED_BYTES: 505552
>  MAP_INPUT_RECORDS: 19446
>  REDUCE_SHUFFLE_BYTES: 505552
>  SPILLED_RECORDS: 70358
>  MAP_OUTPUT_BYTES: 1750111
>  CPU_MILLISECONDS: 5700
>  COMMITTED_HEAP_BYTES: 401997824
>  COMBINE_INPUT_RECORDS: 181151
>  SPLIT_RAW_BYTES: 160
>  REDUCE_INPUT_RECORDS: 35179
>  REDUCE_INPUT_GROUPS: 35179
>  COMBINE_OUTPUT_RECORDS:35179
>  PHYSICAL_MEMORY_BYTES: 378482688
>  REDUCE_OUTPUT_RECORDS: 35179
>  VIRTUAL_MEMORY_BYTES: 1139838976
>  MAP_OUTPUT_RECORDS: 181151
>
>
> Here are most of the relevant screens from the JobTracker web interface:
> http://jsfiddle.net/Fguyy/2/embedded/result/
>
> Here is the JobTracker log (relevant time frame):
> http://pastebin.com/dvsMn4fB
>
> Thanks!
> Philippe
>
> -------------------------------
> *Philippe Signoret*
> Skype: philippesignoret
> +33 6 95 89 55 55
>
>

Re: MAP_INPUT_BYTES missing from counters

Posted by Philippe Signoret <ph...@gmail.com>.

Nope, regular simple text file (.txt from Guttenberg).

I'll keep looking into it and try to reproduce consistently.

Thanks!
Philippe
On Apr 6, 2013 1:39 PM, "yypvsxf19870706" <yy...@gmail.com> wrote:

> Hi
>
>      Is your input file compressed or named with the suffix gz ,or like
> that?
>      It is interesting .
>      Map_input_bytes is the number of bytes of uncompressed  input
> consumed by all the maps in the job.incremented every time a record is read
> from a RecordReader and passed to the map's map method by framework
> .[Hadoop Definitive Guide page 226]
>
>    Please inform of us ,if you get anything further.
>
> Regards.
>
>
>
> 发自我的 iPhone
>
> 在 2013-4-6，0:01，Philippe Signoret <ph...@gmail.com> 写道：
>
> I noticed recently that some Word Count jobs I've run are finishing with
> the MAP_INPUT_BYTES counter missing.
>
> I'm using Hadoop 1.1.2 with mostly default configuration with 5 nodes. The
> input was a single 100KB text file.
>
> Questions:
>
>    - Is it normal for any final counters values not to be present?
>    - Is MAP_INPUT_BYTES the best was to determine total input data size?
>    (I do so programmatically, while it's running and after the job is
>    complete.)
>
> The counters I did get:
>
> Job Counters
>  TOTAL_LAUNCHED_REDUCES:1
>  SLOTS_MILLIS_MAPS: 6006
>  FALLOW_SLOTS_MILLIS_REDUCES: 0
>  FALLOW_SLOTS_MILLIS_MAPS: 0
>  TOTAL_LAUNCHED_MAPS: 1
>  DATA_LOCAL_MAPS: 1
>  SLOTS_MILLIS_REDUCES: 9293
> File Output Format Counters
>  BYTES_WRITTEN: 366752
> FileSystemCounters
>  FILE_BYTES_READ: 505552
>  HDFS_BYTES_READ: 1085517
>  FILE_BYTES_WRITTEN: 1122685
>  HDFS_BYTES_WRITTEN: 366752
> File Input Format Counters
>  BYTES_READ: 1085357
> Map-Reduce Framework
>  MAP_OUTPUT_MATERIALIZED_BYTES: 505552
>  MAP_INPUT_RECORDS: 19446
>  REDUCE_SHUFFLE_BYTES: 505552
>  SPILLED_RECORDS: 70358
>  MAP_OUTPUT_BYTES: 1750111
>  CPU_MILLISECONDS: 5700
>  COMMITTED_HEAP_BYTES: 401997824
>  COMBINE_INPUT_RECORDS: 181151
>  SPLIT_RAW_BYTES: 160
>  REDUCE_INPUT_RECORDS: 35179
>  REDUCE_INPUT_GROUPS: 35179
>  COMBINE_OUTPUT_RECORDS:35179
>  PHYSICAL_MEMORY_BYTES: 378482688
>  REDUCE_OUTPUT_RECORDS: 35179
>  VIRTUAL_MEMORY_BYTES: 1139838976
>  MAP_OUTPUT_RECORDS: 181151
>
>
> Here are most of the relevant screens from the JobTracker web interface:
> http://jsfiddle.net/Fguyy/2/embedded/result/
>
> Here is the JobTracker log (relevant time frame):
> http://pastebin.com/dvsMn4fB
>
> Thanks!
> Philippe
>
> -------------------------------
> *Philippe Signoret*
> Skype: philippesignoret
> +33 6 95 89 55 55
>
>

Re: MAP_INPUT_BYTES missing from counters

Posted by Philippe Signoret <ph...@gmail.com>.

Nope, regular simple text file (.txt from Guttenberg).

I'll keep looking into it and try to reproduce consistently.

Thanks!
Philippe
On Apr 6, 2013 1:39 PM, "yypvsxf19870706" <yy...@gmail.com> wrote:

> Hi
>
>      Is your input file compressed or named with the suffix gz ,or like
> that?
>      It is interesting .
>      Map_input_bytes is the number of bytes of uncompressed  input
> consumed by all the maps in the job.incremented every time a record is read
> from a RecordReader and passed to the map's map method by framework
> .[Hadoop Definitive Guide page 226]
>
>    Please inform of us ,if you get anything further.
>
> Regards.
>
>
>
> 发自我的 iPhone
>
> 在 2013-4-6，0:01，Philippe Signoret <ph...@gmail.com> 写道：
>
> I noticed recently that some Word Count jobs I've run are finishing with
> the MAP_INPUT_BYTES counter missing.
>
> I'm using Hadoop 1.1.2 with mostly default configuration with 5 nodes. The
> input was a single 100KB text file.
>
> Questions:
>
>    - Is it normal for any final counters values not to be present?
>    - Is MAP_INPUT_BYTES the best was to determine total input data size?
>    (I do so programmatically, while it's running and after the job is
>    complete.)
>
> The counters I did get:
>
> Job Counters
>  TOTAL_LAUNCHED_REDUCES:1
>  SLOTS_MILLIS_MAPS: 6006
>  FALLOW_SLOTS_MILLIS_REDUCES: 0
>  FALLOW_SLOTS_MILLIS_MAPS: 0
>  TOTAL_LAUNCHED_MAPS: 1
>  DATA_LOCAL_MAPS: 1
>  SLOTS_MILLIS_REDUCES: 9293
> File Output Format Counters
>  BYTES_WRITTEN: 366752
> FileSystemCounters
>  FILE_BYTES_READ: 505552
>  HDFS_BYTES_READ: 1085517
>  FILE_BYTES_WRITTEN: 1122685
>  HDFS_BYTES_WRITTEN: 366752
> File Input Format Counters
>  BYTES_READ: 1085357
> Map-Reduce Framework
>  MAP_OUTPUT_MATERIALIZED_BYTES: 505552
>  MAP_INPUT_RECORDS: 19446
>  REDUCE_SHUFFLE_BYTES: 505552
>  SPILLED_RECORDS: 70358
>  MAP_OUTPUT_BYTES: 1750111
>  CPU_MILLISECONDS: 5700
>  COMMITTED_HEAP_BYTES: 401997824
>  COMBINE_INPUT_RECORDS: 181151
>  SPLIT_RAW_BYTES: 160
>  REDUCE_INPUT_RECORDS: 35179
>  REDUCE_INPUT_GROUPS: 35179
>  COMBINE_OUTPUT_RECORDS:35179
>  PHYSICAL_MEMORY_BYTES: 378482688
>  REDUCE_OUTPUT_RECORDS: 35179
>  VIRTUAL_MEMORY_BYTES: 1139838976
>  MAP_OUTPUT_RECORDS: 181151
>
>
> Here are most of the relevant screens from the JobTracker web interface:
> http://jsfiddle.net/Fguyy/2/embedded/result/
>
> Here is the JobTracker log (relevant time frame):
> http://pastebin.com/dvsMn4fB
>
> Thanks!
> Philippe
>
> -------------------------------
> *Philippe Signoret*
> Skype: philippesignoret
> +33 6 95 89 55 55
>
>

Re: MAP_INPUT_BYTES missing from counters

Posted by yypvsxf19870706 <yy...@gmail.com>.

Hi 

     Is your input file compressed or named with the suffix gz ,or like that?
     It is interesting .
     Map_input_bytes is the number of bytes of uncompressed  input consumed by all the maps in the job.incremented every time a record is read from a RecordReader and passed to the map's map method by framework .[Hadoop Definitive Guide page 226]

   Please inform of us ,if you get anything further.

Regards.



发自我的 iPhone

在 2013-4-6，0:01，Philippe Signoret <ph...@gmail.com> 写道：

> I noticed recently that some Word Count jobs I've run are finishing with the MAP_INPUT_BYTES counter missing.
> 
> I'm using Hadoop 1.1.2 with mostly default configuration with 5 nodes. The input was a single 100KB text file.
> 
> Questions:
> Is it normal for any final counters values not to be present?
> Is MAP_INPUT_BYTES the best was to determine total input data size? (I do so programmatically, while it's running and after the job is complete.)
> The counters I did get:
> 
> Job Counters 
>  TOTAL_LAUNCHED_REDUCES:1
>  SLOTS_MILLIS_MAPS:	6006
>  FALLOW_SLOTS_MILLIS_REDUCES:	0
>  FALLOW_SLOTS_MILLIS_MAPS:	0
>  TOTAL_LAUNCHED_MAPS:	1
>  DATA_LOCAL_MAPS:	1
>  SLOTS_MILLIS_REDUCES:	9293
> File Output Format Counters 
>  BYTES_WRITTEN:		366752
> FileSystemCounters
>  FILE_BYTES_READ:	505552
>  HDFS_BYTES_READ:	1085517
>  FILE_BYTES_WRITTEN:	1122685
>  HDFS_BYTES_WRITTEN:	366752
> File Input Format Counters 
>  BYTES_READ:	1085357
> Map-Reduce Framework
>  MAP_OUTPUT_MATERIALIZED_BYTES:	505552
>  MAP_INPUT_RECORDS:	19446
>  REDUCE_SHUFFLE_BYTES:	505552
>  SPILLED_RECORDS:	70358
>  MAP_OUTPUT_BYTES:	1750111
>  CPU_MILLISECONDS:	5700
>  COMMITTED_HEAP_BYTES:	401997824
>  COMBINE_INPUT_RECORDS:	181151
>  SPLIT_RAW_BYTES:	160
>  REDUCE_INPUT_RECORDS:	35179
>  REDUCE_INPUT_GROUPS:	35179
>  COMBINE_OUTPUT_RECORDS:35179
>  PHYSICAL_MEMORY_BYTES:	378482688
>  REDUCE_OUTPUT_RECORDS:	35179
>  VIRTUAL_MEMORY_BYTES:	1139838976
>  MAP_OUTPUT_RECORDS:	181151
> 
> Here are most of the relevant screens from the JobTracker web interface: http://jsfiddle.net/Fguyy/2/embedded/result/
> 
> Here is the JobTracker log (relevant time frame): http://pastebin.com/dvsMn4fB
> 
> Thanks!
> Philippe
> 
> -------------------------------
> Philippe Signoret
> Skype: philippesignoret
> +33 6 95 89 55 55

Re: MAP_INPUT_BYTES missing from counters

Posted by yypvsxf19870706 <yy...@gmail.com>.

Hi 

     Is your input file compressed or named with the suffix gz ,or like that?
     It is interesting .
     Map_input_bytes is the number of bytes of uncompressed  input consumed by all the maps in the job.incremented every time a record is read from a RecordReader and passed to the map's map method by framework .[Hadoop Definitive Guide page 226]

   Please inform of us ,if you get anything further.

Regards.



发自我的 iPhone

在 2013-4-6，0:01，Philippe Signoret <ph...@gmail.com> 写道：

> I noticed recently that some Word Count jobs I've run are finishing with the MAP_INPUT_BYTES counter missing.
> 
> I'm using Hadoop 1.1.2 with mostly default configuration with 5 nodes. The input was a single 100KB text file.
> 
> Questions:
> Is it normal for any final counters values not to be present?
> Is MAP_INPUT_BYTES the best was to determine total input data size? (I do so programmatically, while it's running and after the job is complete.)
> The counters I did get:
> 
> Job Counters 
>  TOTAL_LAUNCHED_REDUCES:1
>  SLOTS_MILLIS_MAPS:	6006
>  FALLOW_SLOTS_MILLIS_REDUCES:	0
>  FALLOW_SLOTS_MILLIS_MAPS:	0
>  TOTAL_LAUNCHED_MAPS:	1
>  DATA_LOCAL_MAPS:	1
>  SLOTS_MILLIS_REDUCES:	9293
> File Output Format Counters 
>  BYTES_WRITTEN:		366752
> FileSystemCounters
>  FILE_BYTES_READ:	505552
>  HDFS_BYTES_READ:	1085517
>  FILE_BYTES_WRITTEN:	1122685
>  HDFS_BYTES_WRITTEN:	366752
> File Input Format Counters 
>  BYTES_READ:	1085357
> Map-Reduce Framework
>  MAP_OUTPUT_MATERIALIZED_BYTES:	505552
>  MAP_INPUT_RECORDS:	19446
>  REDUCE_SHUFFLE_BYTES:	505552
>  SPILLED_RECORDS:	70358
>  MAP_OUTPUT_BYTES:	1750111
>  CPU_MILLISECONDS:	5700
>  COMMITTED_HEAP_BYTES:	401997824
>  COMBINE_INPUT_RECORDS:	181151
>  SPLIT_RAW_BYTES:	160
>  REDUCE_INPUT_RECORDS:	35179
>  REDUCE_INPUT_GROUPS:	35179
>  COMBINE_OUTPUT_RECORDS:35179
>  PHYSICAL_MEMORY_BYTES:	378482688
>  REDUCE_OUTPUT_RECORDS:	35179
>  VIRTUAL_MEMORY_BYTES:	1139838976
>  MAP_OUTPUT_RECORDS:	181151
> 
> Here are most of the relevant screens from the JobTracker web interface: http://jsfiddle.net/Fguyy/2/embedded/result/
> 
> Here is the JobTracker log (relevant time frame): http://pastebin.com/dvsMn4fB
> 
> Thanks!
> Philippe
> 
> -------------------------------
> Philippe Signoret
> Skype: philippesignoret
> +33 6 95 89 55 55

Re: MAP_INPUT_BYTES missing from counters

Posted by yypvsxf19870706 <yy...@gmail.com>.

Hi 

     Is your input file compressed or named with the suffix gz ,or like that?
     It is interesting .
     Map_input_bytes is the number of bytes of uncompressed  input consumed by all the maps in the job.incremented every time a record is read from a RecordReader and passed to the map's map method by framework .[Hadoop Definitive Guide page 226]

   Please inform of us ,if you get anything further.

Regards.



�����ҵ� iPhone

�� 2013-4-6��0:01��Philippe Signoret <ph...@gmail.com> д����

> I noticed recently that some Word Count jobs I've run are finishing with the MAP_INPUT_BYTES counter missing.
> 
> I'm using Hadoop 1.1.2 with mostly default configuration with 5 nodes. The input was a single 100KB text file.
> 
> Questions:
> Is it normal for any final counters values not to be present?
> Is MAP_INPUT_BYTES the best was to determine total input data size? (I do so programmatically, while it's running and after the job is complete.)
> The counters I did get:
> 
> Job Counters 
>  TOTAL_LAUNCHED_REDUCES:1
>  SLOTS_MILLIS_MAPS:	6006
>  FALLOW_SLOTS_MILLIS_REDUCES:	0
>  FALLOW_SLOTS_MILLIS_MAPS:	0
>  TOTAL_LAUNCHED_MAPS:	1
>  DATA_LOCAL_MAPS:	1
>  SLOTS_MILLIS_REDUCES:	9293
> File Output Format Counters 
>  BYTES_WRITTEN:		366752
> FileSystemCounters
>  FILE_BYTES_READ:	505552
>  HDFS_BYTES_READ:	1085517
>  FILE_BYTES_WRITTEN:	1122685
>  HDFS_BYTES_WRITTEN:	366752
> File Input Format Counters 
>  BYTES_READ:	1085357
> Map-Reduce Framework
>  MAP_OUTPUT_MATERIALIZED_BYTES:	505552
>  MAP_INPUT_RECORDS:	19446
>  REDUCE_SHUFFLE_BYTES:	505552
>  SPILLED_RECORDS:	70358
>  MAP_OUTPUT_BYTES:	1750111
>  CPU_MILLISECONDS:	5700
>  COMMITTED_HEAP_BYTES:	401997824
>  COMBINE_INPUT_RECORDS:	181151
>  SPLIT_RAW_BYTES:	160
>  REDUCE_INPUT_RECORDS:	35179
>  REDUCE_INPUT_GROUPS:	35179
>  COMBINE_OUTPUT_RECORDS:35179
>  PHYSICAL_MEMORY_BYTES:	378482688
>  REDUCE_OUTPUT_RECORDS:	35179
>  VIRTUAL_MEMORY_BYTES:	1139838976
>  MAP_OUTPUT_RECORDS:	181151
> 
> Here are most of the relevant screens from the JobTracker web interface: http://jsfiddle.net/Fguyy/2/embedded/result/
> 
> Here is the JobTracker log (relevant time frame): http://pastebin.com/dvsMn4fB
> 
> Thanks!
> Philippe
> 
> -------------------------------
> Philippe Signoret
> Skype: philippesignoret
> +33 6 95 89 55 55

Re: MAP_INPUT_BYTES missing from counters

Posted by yypvsxf19870706 <yy...@gmail.com>.

Hi 

     Is your input file compressed or named with the suffix gz ,or like that?
     It is interesting .
     Map_input_bytes is the number of bytes of uncompressed  input consumed by all the maps in the job.incremented every time a record is read from a RecordReader and passed to the map's map method by framework .[Hadoop Definitive Guide page 226]

   Please inform of us ,if you get anything further.

Regards.



�����ҵ� iPhone

�� 2013-4-6��0:01��Philippe Signoret <ph...@gmail.com> д����

> I noticed recently that some Word Count jobs I've run are finishing with the MAP_INPUT_BYTES counter missing.
> 
> I'm using Hadoop 1.1.2 with mostly default configuration with 5 nodes. The input was a single 100KB text file.
> 
> Questions:
> Is it normal for any final counters values not to be present?
> Is MAP_INPUT_BYTES the best was to determine total input data size? (I do so programmatically, while it's running and after the job is complete.)
> The counters I did get:
> 
> Job Counters 
>  TOTAL_LAUNCHED_REDUCES:1
>  SLOTS_MILLIS_MAPS:	6006
>  FALLOW_SLOTS_MILLIS_REDUCES:	0
>  FALLOW_SLOTS_MILLIS_MAPS:	0
>  TOTAL_LAUNCHED_MAPS:	1
>  DATA_LOCAL_MAPS:	1
>  SLOTS_MILLIS_REDUCES:	9293
> File Output Format Counters 
>  BYTES_WRITTEN:		366752
> FileSystemCounters
>  FILE_BYTES_READ:	505552
>  HDFS_BYTES_READ:	1085517
>  FILE_BYTES_WRITTEN:	1122685
>  HDFS_BYTES_WRITTEN:	366752
> File Input Format Counters 
>  BYTES_READ:	1085357
> Map-Reduce Framework
>  MAP_OUTPUT_MATERIALIZED_BYTES:	505552
>  MAP_INPUT_RECORDS:	19446
>  REDUCE_SHUFFLE_BYTES:	505552
>  SPILLED_RECORDS:	70358
>  MAP_OUTPUT_BYTES:	1750111
>  CPU_MILLISECONDS:	5700
>  COMMITTED_HEAP_BYTES:	401997824
>  COMBINE_INPUT_RECORDS:	181151
>  SPLIT_RAW_BYTES:	160
>  REDUCE_INPUT_RECORDS:	35179
>  REDUCE_INPUT_GROUPS:	35179
>  COMBINE_OUTPUT_RECORDS:35179
>  PHYSICAL_MEMORY_BYTES:	378482688
>  REDUCE_OUTPUT_RECORDS:	35179
>  VIRTUAL_MEMORY_BYTES:	1139838976
>  MAP_OUTPUT_RECORDS:	181151
> 
> Here are most of the relevant screens from the JobTracker web interface: http://jsfiddle.net/Fguyy/2/embedded/result/
> 
> Here is the JobTracker log (relevant time frame): http://pastebin.com/dvsMn4fB
> 
> Thanks!
> Philippe
> 
> -------------------------------
> Philippe Signoret
> Skype: philippesignoret
> +33 6 95 89 55 55