You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Haijun Cao <ha...@kindsight.net> on 2008/06/06 01:25:01 UTC

RE: local bytes read/written


I am getting worried on the huge number of bytes written to local fs. I
have a 2 machine cluster, one has 100% io util, one has 10-20% io util
during map phase, the input data is replicated on both machines
(replication = 2). So I suspect the extra 80-90% io on the first machine
is caused by read/write to local fs.

Which machine and which directory does this "local fs" refer to? So that
I can check myself if it is the culprit.

Thanks.
Haijun 

-----Original Message-----
From: Haijun Cao [mailto:haijun@kindsight.net] 
Sent: Wednesday, June 04, 2008 10:44 PM
To: pig-user@incubator.apache.org
Subject: local bytes read/written

Hi,

I just started using pig, it is really fun to write pig query.

I noticed in the map reduce job page, it reports bytes read/written
from/to local file system, and the number is 2x, 3x of the bytes
read/write to hadoop. Just want to understand the internal working of
pig a little bit better, what operations read/write to local fs? For
what purpose? Is it to the local fs of the data nodes? which directory?

Thanks
Haijun 

Re: local bytes read/written

Posted by Alan Gates <ga...@yahoo-inc.com>.
During map reduce, hadoop creates a number of temporary files.  These 
include the output of maps, and any dumps that the sort/merge algorithm 
has to do.  All these are written to local fs.  Only final outputs are 
written to hdfs.  That's why you're seeing so much more local io.

Alan.

Haijun Cao wrote:
> I am getting worried on the huge number of bytes written to local fs. I
> have a 2 machine cluster, one has 100% io util, one has 10-20% io util
> during map phase, the input data is replicated on both machines
> (replication = 2). So I suspect the extra 80-90% io on the first machine
> is caused by read/write to local fs.
>
> Which machine and which directory does this "local fs" refer to? So that
> I can check myself if it is the culprit.
>
> Thanks.
> Haijun 
>
> -----Original Message-----
> From: Haijun Cao [mailto:haijun@kindsight.net] 
> Sent: Wednesday, June 04, 2008 10:44 PM
> To: pig-user@incubator.apache.org
> Subject: local bytes read/written
>
> Hi,
>
> I just started using pig, it is really fun to write pig query.
>
> I noticed in the map reduce job page, it reports bytes read/written
> from/to local file system, and the number is 2x, 3x of the bytes
> read/write to hadoop. Just want to understand the internal working of
> pig a little bit better, what operations read/write to local fs? For
> what purpose? Is it to the local fs of the data nodes? which directory?
>
> Thanks
> Haijun 
>   

Re: local bytes read/written

Posted by Alan Gates <ga...@yahoo-inc.com>.
During map reduce, hadoop creates a number of temporary files.  These 
include the output of maps, and any dumps that the sort/merge algorithm 
has to do.  All these are written to local fs.  Only final outputs are 
written to hdfs.  That's why you're seeing so much more local io.

Alan.

Haijun Cao wrote:
> I am getting worried on the huge number of bytes written to local fs. I
> have a 2 machine cluster, one has 100% io util, one has 10-20% io util
> during map phase, the input data is replicated on both machines
> (replication = 2). So I suspect the extra 80-90% io on the first machine
> is caused by read/write to local fs.
>
> Which machine and which directory does this "local fs" refer to? So that
> I can check myself if it is the culprit.
>
> Thanks.
> Haijun 
>
> -----Original Message-----
> From: Haijun Cao [mailto:haijun@kindsight.net] 
> Sent: Wednesday, June 04, 2008 10:44 PM
> To: pig-user@incubator.apache.org
> Subject: local bytes read/written
>
> Hi,
>
> I just started using pig, it is really fun to write pig query.
>
> I noticed in the map reduce job page, it reports bytes read/written
> from/to local file system, and the number is 2x, 3x of the bytes
> read/write to hadoop. Just want to understand the internal working of
> pig a little bit better, what operations read/write to local fs? For
> what purpose? Is it to the local fs of the data nodes? which directory?
>
> Thanks
> Haijun 
>