You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by maha <ma...@umail.ucsb.edu> on 2011/02/25 20:34:09 UTC

Lost in HDFS_BYTES_READ/WRITTEN

Hello, please help me clear me ideas!

  When a reducer reads map-output data remotely ... Is that reflected in the HDFS_BYTES_READ?

  Or is HDFS_BYTES_READ/WRITTEN is only for the start and end of a job ? ie. first data read for maps as input and last data written from reducer as output for user to see.


Thank you in advance,

Maha

Re: Lost in HDFS_BYTES_READ/WRITTEN

Posted by maha <ma...@umail.ucsb.edu>.
Thanks for your reply Harsh, but this is confusing me more :(

I can't experiment this because I'm using a single machine now and everything is reported as Local read/written.

or can I ?
I'm using this line hdfs = FileSystem.get(getConf()); which I think means that the instance created is distributed.
but the jobCoutners never uses it for intermediate results (Eg. for reducers to read map-outputs)

So if you can answer my question further, I truly appreciate it !  

Maha

On Feb 25, 2011, at 12:00 PM, Harsh J wrote:

> From what I could gather, all FileSystem instances put in an entry
> into a static 'statistics' map. This map is used to update the
> counters for each Task. Hence, all operations done on the same HDFS
> URI by either the task or your application code, must be counted as
> one. In fact, even if you are reading off another HDFS, the scheme
> match is alone seen, so it would aggregate to the same counter as
> well.
> 
> I'm not very sure of this though. Perhaps writing a simple test should
> be adequate to learn the truth.
> 
> On Sat, Feb 26, 2011 at 1:04 AM, maha <ma...@umail.ucsb.edu> wrote:
>> Hello, please help me clear me ideas!
>> 
>>  When a reducer reads map-output data remotely ... Is that reflected in the HDFS_BYTES_READ?
>> 
>>  Or is HDFS_BYTES_READ/WRITTEN is only for the start and end of a job ? ie. first data read for maps as input and last data written from reducer as output for user to see.
>> 
>> 
>> Thank you in advance,
>> 
>> Maha
> 
> 
> 
> -- 
> Harsh J
> www.harshj.com


Re: Lost in HDFS_BYTES_READ/WRITTEN

Posted by Harsh J <qw...@gmail.com>.
>From what I could gather, all FileSystem instances put in an entry
into a static 'statistics' map. This map is used to update the
counters for each Task. Hence, all operations done on the same HDFS
URI by either the task or your application code, must be counted as
one. In fact, even if you are reading off another HDFS, the scheme
match is alone seen, so it would aggregate to the same counter as
well.

I'm not very sure of this though. Perhaps writing a simple test should
be adequate to learn the truth.

On Sat, Feb 26, 2011 at 1:04 AM, maha <ma...@umail.ucsb.edu> wrote:
> Hello, please help me clear me ideas!
>
>  When a reducer reads map-output data remotely ... Is that reflected in the HDFS_BYTES_READ?
>
>  Or is HDFS_BYTES_READ/WRITTEN is only for the start and end of a job ? ie. first data read for maps as input and last data written from reducer as output for user to see.
>
>
> Thank you in advance,
>
> Maha



-- 
Harsh J
www.harshj.com