You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Philippe Signoret <ph...@gmail.com> on 2013/04/05 18:01:16 UTC
MAP_INPUT_BYTES missing from counters
I noticed recently that some Word Count jobs I've run are finishing with
the MAP_INPUT_BYTES counter missing.
I'm using Hadoop 1.1.2 with mostly default configuration with 5 nodes. The
input was a single 100KB text file.
Questions:
- Is it normal for any final counters values not to be present?
- Is MAP_INPUT_BYTES the best was to determine total input data size? (I
do so programmatically, while it's running and after the job is complete.)
The counters I did get:
Job Counters
TOTAL_LAUNCHED_REDUCES:1
SLOTS_MILLIS_MAPS: 6006
FALLOW_SLOTS_MILLIS_REDUCES: 0
FALLOW_SLOTS_MILLIS_MAPS: 0
TOTAL_LAUNCHED_MAPS: 1
DATA_LOCAL_MAPS: 1
SLOTS_MILLIS_REDUCES: 9293
File Output Format Counters
BYTES_WRITTEN: 366752
FileSystemCounters
FILE_BYTES_READ: 505552
HDFS_BYTES_READ: 1085517
FILE_BYTES_WRITTEN: 1122685
HDFS_BYTES_WRITTEN: 366752
File Input Format Counters
BYTES_READ: 1085357
Map-Reduce Framework
MAP_OUTPUT_MATERIALIZED_BYTES: 505552
MAP_INPUT_RECORDS: 19446
REDUCE_SHUFFLE_BYTES: 505552
SPILLED_RECORDS: 70358
MAP_OUTPUT_BYTES: 1750111
CPU_MILLISECONDS: 5700
COMMITTED_HEAP_BYTES: 401997824
COMBINE_INPUT_RECORDS: 181151
SPLIT_RAW_BYTES: 160
REDUCE_INPUT_RECORDS: 35179
REDUCE_INPUT_GROUPS: 35179
COMBINE_OUTPUT_RECORDS:35179
PHYSICAL_MEMORY_BYTES: 378482688
REDUCE_OUTPUT_RECORDS: 35179
VIRTUAL_MEMORY_BYTES: 1139838976
MAP_OUTPUT_RECORDS: 181151
Here are most of the relevant screens from the JobTracker web interface:
http://jsfiddle.net/Fguyy/2/embedded/result/
Here is the JobTracker log (relevant time frame):
http://pastebin.com/dvsMn4fB
Thanks!
Philippe
-------------------------------
*Philippe Signoret*
Skype: philippesignoret
+33 6 95 89 55 55
Re: MAP_INPUT_BYTES missing from counters
Posted by Philippe Signoret <ph...@gmail.com>.
Nope, regular simple text file (.txt from Guttenberg).
I'll keep looking into it and try to reproduce consistently.
Thanks!
Philippe
On Apr 6, 2013 1:39 PM, "yypvsxf19870706" <yy...@gmail.com> wrote:
> Hi
>
> Is your input file compressed or named with the suffix gz ,or like
> that?
> It is interesting .
> Map_input_bytes is the number of bytes of uncompressed input
> consumed by all the maps in the job.incremented every time a record is read
> from a RecordReader and passed to the map's map method by framework
> .[Hadoop Definitive Guide page 226]
>
> Please inform of us ,if you get anything further.
>
> Regards.
>
>
>
> 发自我的 iPhone
>
> 在 2013-4-6,0:01,Philippe Signoret <ph...@gmail.com> 写道:
>
> I noticed recently that some Word Count jobs I've run are finishing with
> the MAP_INPUT_BYTES counter missing.
>
> I'm using Hadoop 1.1.2 with mostly default configuration with 5 nodes. The
> input was a single 100KB text file.
>
> Questions:
>
> - Is it normal for any final counters values not to be present?
> - Is MAP_INPUT_BYTES the best was to determine total input data size?
> (I do so programmatically, while it's running and after the job is
> complete.)
>
> The counters I did get:
>
> Job Counters
> TOTAL_LAUNCHED_REDUCES:1
> SLOTS_MILLIS_MAPS: 6006
> FALLOW_SLOTS_MILLIS_REDUCES: 0
> FALLOW_SLOTS_MILLIS_MAPS: 0
> TOTAL_LAUNCHED_MAPS: 1
> DATA_LOCAL_MAPS: 1
> SLOTS_MILLIS_REDUCES: 9293
> File Output Format Counters
> BYTES_WRITTEN: 366752
> FileSystemCounters
> FILE_BYTES_READ: 505552
> HDFS_BYTES_READ: 1085517
> FILE_BYTES_WRITTEN: 1122685
> HDFS_BYTES_WRITTEN: 366752
> File Input Format Counters
> BYTES_READ: 1085357
> Map-Reduce Framework
> MAP_OUTPUT_MATERIALIZED_BYTES: 505552
> MAP_INPUT_RECORDS: 19446
> REDUCE_SHUFFLE_BYTES: 505552
> SPILLED_RECORDS: 70358
> MAP_OUTPUT_BYTES: 1750111
> CPU_MILLISECONDS: 5700
> COMMITTED_HEAP_BYTES: 401997824
> COMBINE_INPUT_RECORDS: 181151
> SPLIT_RAW_BYTES: 160
> REDUCE_INPUT_RECORDS: 35179
> REDUCE_INPUT_GROUPS: 35179
> COMBINE_OUTPUT_RECORDS:35179
> PHYSICAL_MEMORY_BYTES: 378482688
> REDUCE_OUTPUT_RECORDS: 35179
> VIRTUAL_MEMORY_BYTES: 1139838976
> MAP_OUTPUT_RECORDS: 181151
>
>
> Here are most of the relevant screens from the JobTracker web interface:
> http://jsfiddle.net/Fguyy/2/embedded/result/
>
> Here is the JobTracker log (relevant time frame):
> http://pastebin.com/dvsMn4fB
>
> Thanks!
> Philippe
>
> -------------------------------
> *Philippe Signoret*
> Skype: philippesignoret
> +33 6 95 89 55 55
>
>
Re: MAP_INPUT_BYTES missing from counters
Posted by Philippe Signoret <ph...@gmail.com>.
Nope, regular simple text file (.txt from Guttenberg).
I'll keep looking into it and try to reproduce consistently.
Thanks!
Philippe
On Apr 6, 2013 1:39 PM, "yypvsxf19870706" <yy...@gmail.com> wrote:
> Hi
>
> Is your input file compressed or named with the suffix gz ,or like
> that?
> It is interesting .
> Map_input_bytes is the number of bytes of uncompressed input
> consumed by all the maps in the job.incremented every time a record is read
> from a RecordReader and passed to the map's map method by framework
> .[Hadoop Definitive Guide page 226]
>
> Please inform of us ,if you get anything further.
>
> Regards.
>
>
>
> 发自我的 iPhone
>
> 在 2013-4-6,0:01,Philippe Signoret <ph...@gmail.com> 写道:
>
> I noticed recently that some Word Count jobs I've run are finishing with
> the MAP_INPUT_BYTES counter missing.
>
> I'm using Hadoop 1.1.2 with mostly default configuration with 5 nodes. The
> input was a single 100KB text file.
>
> Questions:
>
> - Is it normal for any final counters values not to be present?
> - Is MAP_INPUT_BYTES the best was to determine total input data size?
> (I do so programmatically, while it's running and after the job is
> complete.)
>
> The counters I did get:
>
> Job Counters
> TOTAL_LAUNCHED_REDUCES:1
> SLOTS_MILLIS_MAPS: 6006
> FALLOW_SLOTS_MILLIS_REDUCES: 0
> FALLOW_SLOTS_MILLIS_MAPS: 0
> TOTAL_LAUNCHED_MAPS: 1
> DATA_LOCAL_MAPS: 1
> SLOTS_MILLIS_REDUCES: 9293
> File Output Format Counters
> BYTES_WRITTEN: 366752
> FileSystemCounters
> FILE_BYTES_READ: 505552
> HDFS_BYTES_READ: 1085517
> FILE_BYTES_WRITTEN: 1122685
> HDFS_BYTES_WRITTEN: 366752
> File Input Format Counters
> BYTES_READ: 1085357
> Map-Reduce Framework
> MAP_OUTPUT_MATERIALIZED_BYTES: 505552
> MAP_INPUT_RECORDS: 19446
> REDUCE_SHUFFLE_BYTES: 505552
> SPILLED_RECORDS: 70358
> MAP_OUTPUT_BYTES: 1750111
> CPU_MILLISECONDS: 5700
> COMMITTED_HEAP_BYTES: 401997824
> COMBINE_INPUT_RECORDS: 181151
> SPLIT_RAW_BYTES: 160
> REDUCE_INPUT_RECORDS: 35179
> REDUCE_INPUT_GROUPS: 35179
> COMBINE_OUTPUT_RECORDS:35179
> PHYSICAL_MEMORY_BYTES: 378482688
> REDUCE_OUTPUT_RECORDS: 35179
> VIRTUAL_MEMORY_BYTES: 1139838976
> MAP_OUTPUT_RECORDS: 181151
>
>
> Here are most of the relevant screens from the JobTracker web interface:
> http://jsfiddle.net/Fguyy/2/embedded/result/
>
> Here is the JobTracker log (relevant time frame):
> http://pastebin.com/dvsMn4fB
>
> Thanks!
> Philippe
>
> -------------------------------
> *Philippe Signoret*
> Skype: philippesignoret
> +33 6 95 89 55 55
>
>
Re: MAP_INPUT_BYTES missing from counters
Posted by Philippe Signoret <ph...@gmail.com>.
Nope, regular simple text file (.txt from Guttenberg).
I'll keep looking into it and try to reproduce consistently.
Thanks!
Philippe
On Apr 6, 2013 1:39 PM, "yypvsxf19870706" <yy...@gmail.com> wrote:
> Hi
>
> Is your input file compressed or named with the suffix gz ,or like
> that?
> It is interesting .
> Map_input_bytes is the number of bytes of uncompressed input
> consumed by all the maps in the job.incremented every time a record is read
> from a RecordReader and passed to the map's map method by framework
> .[Hadoop Definitive Guide page 226]
>
> Please inform of us ,if you get anything further.
>
> Regards.
>
>
>
> 发自我的 iPhone
>
> 在 2013-4-6,0:01,Philippe Signoret <ph...@gmail.com> 写道:
>
> I noticed recently that some Word Count jobs I've run are finishing with
> the MAP_INPUT_BYTES counter missing.
>
> I'm using Hadoop 1.1.2 with mostly default configuration with 5 nodes. The
> input was a single 100KB text file.
>
> Questions:
>
> - Is it normal for any final counters values not to be present?
> - Is MAP_INPUT_BYTES the best was to determine total input data size?
> (I do so programmatically, while it's running and after the job is
> complete.)
>
> The counters I did get:
>
> Job Counters
> TOTAL_LAUNCHED_REDUCES:1
> SLOTS_MILLIS_MAPS: 6006
> FALLOW_SLOTS_MILLIS_REDUCES: 0
> FALLOW_SLOTS_MILLIS_MAPS: 0
> TOTAL_LAUNCHED_MAPS: 1
> DATA_LOCAL_MAPS: 1
> SLOTS_MILLIS_REDUCES: 9293
> File Output Format Counters
> BYTES_WRITTEN: 366752
> FileSystemCounters
> FILE_BYTES_READ: 505552
> HDFS_BYTES_READ: 1085517
> FILE_BYTES_WRITTEN: 1122685
> HDFS_BYTES_WRITTEN: 366752
> File Input Format Counters
> BYTES_READ: 1085357
> Map-Reduce Framework
> MAP_OUTPUT_MATERIALIZED_BYTES: 505552
> MAP_INPUT_RECORDS: 19446
> REDUCE_SHUFFLE_BYTES: 505552
> SPILLED_RECORDS: 70358
> MAP_OUTPUT_BYTES: 1750111
> CPU_MILLISECONDS: 5700
> COMMITTED_HEAP_BYTES: 401997824
> COMBINE_INPUT_RECORDS: 181151
> SPLIT_RAW_BYTES: 160
> REDUCE_INPUT_RECORDS: 35179
> REDUCE_INPUT_GROUPS: 35179
> COMBINE_OUTPUT_RECORDS:35179
> PHYSICAL_MEMORY_BYTES: 378482688
> REDUCE_OUTPUT_RECORDS: 35179
> VIRTUAL_MEMORY_BYTES: 1139838976
> MAP_OUTPUT_RECORDS: 181151
>
>
> Here are most of the relevant screens from the JobTracker web interface:
> http://jsfiddle.net/Fguyy/2/embedded/result/
>
> Here is the JobTracker log (relevant time frame):
> http://pastebin.com/dvsMn4fB
>
> Thanks!
> Philippe
>
> -------------------------------
> *Philippe Signoret*
> Skype: philippesignoret
> +33 6 95 89 55 55
>
>
Re: MAP_INPUT_BYTES missing from counters
Posted by Philippe Signoret <ph...@gmail.com>.
Nope, regular simple text file (.txt from Guttenberg).
I'll keep looking into it and try to reproduce consistently.
Thanks!
Philippe
On Apr 6, 2013 1:39 PM, "yypvsxf19870706" <yy...@gmail.com> wrote:
> Hi
>
> Is your input file compressed or named with the suffix gz ,or like
> that?
> It is interesting .
> Map_input_bytes is the number of bytes of uncompressed input
> consumed by all the maps in the job.incremented every time a record is read
> from a RecordReader and passed to the map's map method by framework
> .[Hadoop Definitive Guide page 226]
>
> Please inform of us ,if you get anything further.
>
> Regards.
>
>
>
> 发自我的 iPhone
>
> 在 2013-4-6,0:01,Philippe Signoret <ph...@gmail.com> 写道:
>
> I noticed recently that some Word Count jobs I've run are finishing with
> the MAP_INPUT_BYTES counter missing.
>
> I'm using Hadoop 1.1.2 with mostly default configuration with 5 nodes. The
> input was a single 100KB text file.
>
> Questions:
>
> - Is it normal for any final counters values not to be present?
> - Is MAP_INPUT_BYTES the best was to determine total input data size?
> (I do so programmatically, while it's running and after the job is
> complete.)
>
> The counters I did get:
>
> Job Counters
> TOTAL_LAUNCHED_REDUCES:1
> SLOTS_MILLIS_MAPS: 6006
> FALLOW_SLOTS_MILLIS_REDUCES: 0
> FALLOW_SLOTS_MILLIS_MAPS: 0
> TOTAL_LAUNCHED_MAPS: 1
> DATA_LOCAL_MAPS: 1
> SLOTS_MILLIS_REDUCES: 9293
> File Output Format Counters
> BYTES_WRITTEN: 366752
> FileSystemCounters
> FILE_BYTES_READ: 505552
> HDFS_BYTES_READ: 1085517
> FILE_BYTES_WRITTEN: 1122685
> HDFS_BYTES_WRITTEN: 366752
> File Input Format Counters
> BYTES_READ: 1085357
> Map-Reduce Framework
> MAP_OUTPUT_MATERIALIZED_BYTES: 505552
> MAP_INPUT_RECORDS: 19446
> REDUCE_SHUFFLE_BYTES: 505552
> SPILLED_RECORDS: 70358
> MAP_OUTPUT_BYTES: 1750111
> CPU_MILLISECONDS: 5700
> COMMITTED_HEAP_BYTES: 401997824
> COMBINE_INPUT_RECORDS: 181151
> SPLIT_RAW_BYTES: 160
> REDUCE_INPUT_RECORDS: 35179
> REDUCE_INPUT_GROUPS: 35179
> COMBINE_OUTPUT_RECORDS:35179
> PHYSICAL_MEMORY_BYTES: 378482688
> REDUCE_OUTPUT_RECORDS: 35179
> VIRTUAL_MEMORY_BYTES: 1139838976
> MAP_OUTPUT_RECORDS: 181151
>
>
> Here are most of the relevant screens from the JobTracker web interface:
> http://jsfiddle.net/Fguyy/2/embedded/result/
>
> Here is the JobTracker log (relevant time frame):
> http://pastebin.com/dvsMn4fB
>
> Thanks!
> Philippe
>
> -------------------------------
> *Philippe Signoret*
> Skype: philippesignoret
> +33 6 95 89 55 55
>
>
Re: MAP_INPUT_BYTES missing from counters
Posted by yypvsxf19870706 <yy...@gmail.com>.
Hi
Is your input file compressed or named with the suffix gz ,or like that?
It is interesting .
Map_input_bytes is the number of bytes of uncompressed input consumed by all the maps in the job.incremented every time a record is read from a RecordReader and passed to the map's map method by framework .[Hadoop Definitive Guide page 226]
Please inform of us ,if you get anything further.
Regards.
发自我的 iPhone
在 2013-4-6,0:01,Philippe Signoret <ph...@gmail.com> 写道:
> I noticed recently that some Word Count jobs I've run are finishing with the MAP_INPUT_BYTES counter missing.
>
> I'm using Hadoop 1.1.2 with mostly default configuration with 5 nodes. The input was a single 100KB text file.
>
> Questions:
> Is it normal for any final counters values not to be present?
> Is MAP_INPUT_BYTES the best was to determine total input data size? (I do so programmatically, while it's running and after the job is complete.)
> The counters I did get:
>
> Job Counters
> TOTAL_LAUNCHED_REDUCES:1
> SLOTS_MILLIS_MAPS: 6006
> FALLOW_SLOTS_MILLIS_REDUCES: 0
> FALLOW_SLOTS_MILLIS_MAPS: 0
> TOTAL_LAUNCHED_MAPS: 1
> DATA_LOCAL_MAPS: 1
> SLOTS_MILLIS_REDUCES: 9293
> File Output Format Counters
> BYTES_WRITTEN: 366752
> FileSystemCounters
> FILE_BYTES_READ: 505552
> HDFS_BYTES_READ: 1085517
> FILE_BYTES_WRITTEN: 1122685
> HDFS_BYTES_WRITTEN: 366752
> File Input Format Counters
> BYTES_READ: 1085357
> Map-Reduce Framework
> MAP_OUTPUT_MATERIALIZED_BYTES: 505552
> MAP_INPUT_RECORDS: 19446
> REDUCE_SHUFFLE_BYTES: 505552
> SPILLED_RECORDS: 70358
> MAP_OUTPUT_BYTES: 1750111
> CPU_MILLISECONDS: 5700
> COMMITTED_HEAP_BYTES: 401997824
> COMBINE_INPUT_RECORDS: 181151
> SPLIT_RAW_BYTES: 160
> REDUCE_INPUT_RECORDS: 35179
> REDUCE_INPUT_GROUPS: 35179
> COMBINE_OUTPUT_RECORDS:35179
> PHYSICAL_MEMORY_BYTES: 378482688
> REDUCE_OUTPUT_RECORDS: 35179
> VIRTUAL_MEMORY_BYTES: 1139838976
> MAP_OUTPUT_RECORDS: 181151
>
> Here are most of the relevant screens from the JobTracker web interface: http://jsfiddle.net/Fguyy/2/embedded/result/
>
> Here is the JobTracker log (relevant time frame): http://pastebin.com/dvsMn4fB
>
> Thanks!
> Philippe
>
> -------------------------------
> Philippe Signoret
> Skype: philippesignoret
> +33 6 95 89 55 55
Re: MAP_INPUT_BYTES missing from counters
Posted by yypvsxf19870706 <yy...@gmail.com>.
Hi
Is your input file compressed or named with the suffix gz ,or like that?
It is interesting .
Map_input_bytes is the number of bytes of uncompressed input consumed by all the maps in the job.incremented every time a record is read from a RecordReader and passed to the map's map method by framework .[Hadoop Definitive Guide page 226]
Please inform of us ,if you get anything further.
Regards.
发自我的 iPhone
在 2013-4-6,0:01,Philippe Signoret <ph...@gmail.com> 写道:
> I noticed recently that some Word Count jobs I've run are finishing with the MAP_INPUT_BYTES counter missing.
>
> I'm using Hadoop 1.1.2 with mostly default configuration with 5 nodes. The input was a single 100KB text file.
>
> Questions:
> Is it normal for any final counters values not to be present?
> Is MAP_INPUT_BYTES the best was to determine total input data size? (I do so programmatically, while it's running and after the job is complete.)
> The counters I did get:
>
> Job Counters
> TOTAL_LAUNCHED_REDUCES:1
> SLOTS_MILLIS_MAPS: 6006
> FALLOW_SLOTS_MILLIS_REDUCES: 0
> FALLOW_SLOTS_MILLIS_MAPS: 0
> TOTAL_LAUNCHED_MAPS: 1
> DATA_LOCAL_MAPS: 1
> SLOTS_MILLIS_REDUCES: 9293
> File Output Format Counters
> BYTES_WRITTEN: 366752
> FileSystemCounters
> FILE_BYTES_READ: 505552
> HDFS_BYTES_READ: 1085517
> FILE_BYTES_WRITTEN: 1122685
> HDFS_BYTES_WRITTEN: 366752
> File Input Format Counters
> BYTES_READ: 1085357
> Map-Reduce Framework
> MAP_OUTPUT_MATERIALIZED_BYTES: 505552
> MAP_INPUT_RECORDS: 19446
> REDUCE_SHUFFLE_BYTES: 505552
> SPILLED_RECORDS: 70358
> MAP_OUTPUT_BYTES: 1750111
> CPU_MILLISECONDS: 5700
> COMMITTED_HEAP_BYTES: 401997824
> COMBINE_INPUT_RECORDS: 181151
> SPLIT_RAW_BYTES: 160
> REDUCE_INPUT_RECORDS: 35179
> REDUCE_INPUT_GROUPS: 35179
> COMBINE_OUTPUT_RECORDS:35179
> PHYSICAL_MEMORY_BYTES: 378482688
> REDUCE_OUTPUT_RECORDS: 35179
> VIRTUAL_MEMORY_BYTES: 1139838976
> MAP_OUTPUT_RECORDS: 181151
>
> Here are most of the relevant screens from the JobTracker web interface: http://jsfiddle.net/Fguyy/2/embedded/result/
>
> Here is the JobTracker log (relevant time frame): http://pastebin.com/dvsMn4fB
>
> Thanks!
> Philippe
>
> -------------------------------
> Philippe Signoret
> Skype: philippesignoret
> +33 6 95 89 55 55
Re: MAP_INPUT_BYTES missing from counters
Posted by yypvsxf19870706 <yy...@gmail.com>.
Hi
Is your input file compressed or named with the suffix gz ,or like that?
It is interesting .
Map_input_bytes is the number of bytes of uncompressed input consumed by all the maps in the job.incremented every time a record is read from a RecordReader and passed to the map's map method by framework .[Hadoop Definitive Guide page 226]
Please inform of us ,if you get anything further.
Regards.
�����ҵ� iPhone
�� 2013-4-6��0:01��Philippe Signoret <ph...@gmail.com> ���
> I noticed recently that some Word Count jobs I've run are finishing with the MAP_INPUT_BYTES counter missing.
>
> I'm using Hadoop 1.1.2 with mostly default configuration with 5 nodes. The input was a single 100KB text file.
>
> Questions:
> Is it normal for any final counters values not to be present?
> Is MAP_INPUT_BYTES the best was to determine total input data size? (I do so programmatically, while it's running and after the job is complete.)
> The counters I did get:
>
> Job Counters
> TOTAL_LAUNCHED_REDUCES:1
> SLOTS_MILLIS_MAPS: 6006
> FALLOW_SLOTS_MILLIS_REDUCES: 0
> FALLOW_SLOTS_MILLIS_MAPS: 0
> TOTAL_LAUNCHED_MAPS: 1
> DATA_LOCAL_MAPS: 1
> SLOTS_MILLIS_REDUCES: 9293
> File Output Format Counters
> BYTES_WRITTEN: 366752
> FileSystemCounters
> FILE_BYTES_READ: 505552
> HDFS_BYTES_READ: 1085517
> FILE_BYTES_WRITTEN: 1122685
> HDFS_BYTES_WRITTEN: 366752
> File Input Format Counters
> BYTES_READ: 1085357
> Map-Reduce Framework
> MAP_OUTPUT_MATERIALIZED_BYTES: 505552
> MAP_INPUT_RECORDS: 19446
> REDUCE_SHUFFLE_BYTES: 505552
> SPILLED_RECORDS: 70358
> MAP_OUTPUT_BYTES: 1750111
> CPU_MILLISECONDS: 5700
> COMMITTED_HEAP_BYTES: 401997824
> COMBINE_INPUT_RECORDS: 181151
> SPLIT_RAW_BYTES: 160
> REDUCE_INPUT_RECORDS: 35179
> REDUCE_INPUT_GROUPS: 35179
> COMBINE_OUTPUT_RECORDS:35179
> PHYSICAL_MEMORY_BYTES: 378482688
> REDUCE_OUTPUT_RECORDS: 35179
> VIRTUAL_MEMORY_BYTES: 1139838976
> MAP_OUTPUT_RECORDS: 181151
>
> Here are most of the relevant screens from the JobTracker web interface: http://jsfiddle.net/Fguyy/2/embedded/result/
>
> Here is the JobTracker log (relevant time frame): http://pastebin.com/dvsMn4fB
>
> Thanks!
> Philippe
>
> -------------------------------
> Philippe Signoret
> Skype: philippesignoret
> +33 6 95 89 55 55
Re: MAP_INPUT_BYTES missing from counters
Posted by yypvsxf19870706 <yy...@gmail.com>.
Hi
Is your input file compressed or named with the suffix gz ,or like that?
It is interesting .
Map_input_bytes is the number of bytes of uncompressed input consumed by all the maps in the job.incremented every time a record is read from a RecordReader and passed to the map's map method by framework .[Hadoop Definitive Guide page 226]
Please inform of us ,if you get anything further.
Regards.
�����ҵ� iPhone
�� 2013-4-6��0:01��Philippe Signoret <ph...@gmail.com> ���
> I noticed recently that some Word Count jobs I've run are finishing with the MAP_INPUT_BYTES counter missing.
>
> I'm using Hadoop 1.1.2 with mostly default configuration with 5 nodes. The input was a single 100KB text file.
>
> Questions:
> Is it normal for any final counters values not to be present?
> Is MAP_INPUT_BYTES the best was to determine total input data size? (I do so programmatically, while it's running and after the job is complete.)
> The counters I did get:
>
> Job Counters
> TOTAL_LAUNCHED_REDUCES:1
> SLOTS_MILLIS_MAPS: 6006
> FALLOW_SLOTS_MILLIS_REDUCES: 0
> FALLOW_SLOTS_MILLIS_MAPS: 0
> TOTAL_LAUNCHED_MAPS: 1
> DATA_LOCAL_MAPS: 1
> SLOTS_MILLIS_REDUCES: 9293
> File Output Format Counters
> BYTES_WRITTEN: 366752
> FileSystemCounters
> FILE_BYTES_READ: 505552
> HDFS_BYTES_READ: 1085517
> FILE_BYTES_WRITTEN: 1122685
> HDFS_BYTES_WRITTEN: 366752
> File Input Format Counters
> BYTES_READ: 1085357
> Map-Reduce Framework
> MAP_OUTPUT_MATERIALIZED_BYTES: 505552
> MAP_INPUT_RECORDS: 19446
> REDUCE_SHUFFLE_BYTES: 505552
> SPILLED_RECORDS: 70358
> MAP_OUTPUT_BYTES: 1750111
> CPU_MILLISECONDS: 5700
> COMMITTED_HEAP_BYTES: 401997824
> COMBINE_INPUT_RECORDS: 181151
> SPLIT_RAW_BYTES: 160
> REDUCE_INPUT_RECORDS: 35179
> REDUCE_INPUT_GROUPS: 35179
> COMBINE_OUTPUT_RECORDS:35179
> PHYSICAL_MEMORY_BYTES: 378482688
> REDUCE_OUTPUT_RECORDS: 35179
> VIRTUAL_MEMORY_BYTES: 1139838976
> MAP_OUTPUT_RECORDS: 181151
>
> Here are most of the relevant screens from the JobTracker web interface: http://jsfiddle.net/Fguyy/2/embedded/result/
>
> Here is the JobTracker log (relevant time frame): http://pastebin.com/dvsMn4fB
>
> Thanks!
> Philippe
>
> -------------------------------
> Philippe Signoret
> Skype: philippesignoret
> +33 6 95 89 55 55