You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Prashanth Pappu <pr...@conviva.com> on 2008/06/04 02:00:13 UTC

PIG performance

I just upgraded my PIG src to top of svn and see really poor performance
with group and cogroup queries.

Is there a recommended svn version to be used with hadoop 1.17?

Prashanth

Re: PIG performance

Posted by pi song <pi...@gmail.com>.

I think saying "slow down" is too subjective. When we get performance boost
in most areas, some smaller areas might perform poorer than before.
I saw a Jira issue that discussed about introducing a standard way to
benchmark before but didn't see  any progress.


On Thu, Jun 5, 2008 at 5:05 AM, Olga Natkovich <ol...@yahoo-inc.com> wrote:

> We saw some slowdown in the unit tests but not in our end-to-end tests.
>
> Olga
>
> > -----Original Message-----
> > From: prashanth.rinera@gmail.com
> > [mailto:prashanth.rinera@gmail.com] On Behalf Of Prashanth Pappu
> > Sent: Wednesday, June 04, 2008 11:16 AM
> > To: pig-user@incubator.apache.org
> > Subject: Re: PIG performance
> >
> > Yes, the same queries were substantially slower with hadoop
> > 1.17 and the latest svn of PIG (compared to hadoop1.16 and
> > svn 653894).
> >
> > It is hard for me to copy the exact query as there is a lot
> > of context but I zeroed it down to cogroup statements. I can
> > try and create an example. But I wanted to check if there is
> > a specific svn version that others use with hadoop1.17. Or
> > does everyone use top of svn trunk?
> > Because upgrading from hadoop1.16 and older svn version to
> > latest svn version did degrade the performance quite a bit.
> >
> > Prashanth
> >
> > On Wed, Jun 4, 2008 at 10:51 AM, Olga Natkovich
> > <ol...@yahoo-inc.com> wrote:
> >
> > > Do you mean that the same query ran faster with some
> > previous code you
> > > ran? Which code was that?
> > >
> > > Olga
> > >
> > > > -----Original Message-----
> > > > From: prashanth.rinera@gmail.com
> > > > [mailto:prashanth.rinera@gmail.com] On Behalf Of Prashanth Pappu
> > > > Sent: Tuesday, June 03, 2008 5:00 PM
> > > > To: pig-user@incubator.apache.org
> > > > Subject: PIG performance
> > > >
> > > > I just upgraded my PIG src to top of svn and see really poor
> > > > performance with group and cogroup queries.
> > > >
> > > > Is there a recommended svn version to be used with hadoop 1.17?
> > > >
> > > > Prashanth
> > > >
> > >
> >
>

RE: PIG performance

Posted by Olga Natkovich <ol...@yahoo-inc.com>.

We saw some slowdown in the unit tests but not in our end-to-end tests.

Olga 

> -----Original Message-----
> From: prashanth.rinera@gmail.com 
> [mailto:prashanth.rinera@gmail.com] On Behalf Of Prashanth Pappu
> Sent: Wednesday, June 04, 2008 11:16 AM
> To: pig-user@incubator.apache.org
> Subject: Re: PIG performance
> 
> Yes, the same queries were substantially slower with hadoop 
> 1.17 and the latest svn of PIG (compared to hadoop1.16 and 
> svn 653894).
> 
> It is hard for me to copy the exact query as there is a lot 
> of context but I zeroed it down to cogroup statements. I can 
> try and create an example. But I wanted to check if there is 
> a specific svn version that others use with hadoop1.17. Or 
> does everyone use top of svn trunk?
> Because upgrading from hadoop1.16 and older svn version to 
> latest svn version did degrade the performance quite a bit.
> 
> Prashanth
> 
> On Wed, Jun 4, 2008 at 10:51 AM, Olga Natkovich 
> <ol...@yahoo-inc.com> wrote:
> 
> > Do you mean that the same query ran faster with some 
> previous code you 
> > ran? Which code was that?
> >
> > Olga
> >
> > > -----Original Message-----
> > > From: prashanth.rinera@gmail.com
> > > [mailto:prashanth.rinera@gmail.com] On Behalf Of Prashanth Pappu
> > > Sent: Tuesday, June 03, 2008 5:00 PM
> > > To: pig-user@incubator.apache.org
> > > Subject: PIG performance
> > >
> > > I just upgraded my PIG src to top of svn and see really poor 
> > > performance with group and cogroup queries.
> > >
> > > Is there a recommended svn version to be used with hadoop 1.17?
> > >
> > > Prashanth
> > >
> >
>

Re: PIG performance

Posted by Prashanth Pappu <pr...@conviva.com>.

Yes, the same queries were substantially slower with hadoop 1.17 and the
latest svn of PIG (compared to hadoop1.16 and svn 653894).

It is hard for me to copy the exact query as there is a lot of context but I
zeroed it down to cogroup statements. I can try and create an example. But I
wanted to check if there is a specific svn version that others use with
hadoop1.17. Or does everyone use top of svn trunk?
Because upgrading from hadoop1.16 and older svn version to latest svn
version did degrade the performance quite a bit.

Prashanth

On Wed, Jun 4, 2008 at 10:51 AM, Olga Natkovich <ol...@yahoo-inc.com> wrote:

> Do you mean that the same query ran faster with some previous code you
> ran? Which code was that?
>
> Olga
>
> > -----Original Message-----
> > From: prashanth.rinera@gmail.com
> > [mailto:prashanth.rinera@gmail.com] On Behalf Of Prashanth Pappu
> > Sent: Tuesday, June 03, 2008 5:00 PM
> > To: pig-user@incubator.apache.org
> > Subject: PIG performance
> >
> > I just upgraded my PIG src to top of svn and see really poor
> > performance with group and cogroup queries.
> >
> > Is there a recommended svn version to be used with hadoop 1.17?
> >
> > Prashanth
> >
>

Re: local bytes read/written

Posted by Alan Gates <ga...@yahoo-inc.com>.

During map reduce, hadoop creates a number of temporary files.  These 
include the output of maps, and any dumps that the sort/merge algorithm 
has to do.  All these are written to local fs.  Only final outputs are 
written to hdfs.  That's why you're seeing so much more local io.

Alan.

Haijun Cao wrote:
> I am getting worried on the huge number of bytes written to local fs. I
> have a 2 machine cluster, one has 100% io util, one has 10-20% io util
> during map phase, the input data is replicated on both machines
> (replication = 2). So I suspect the extra 80-90% io on the first machine
> is caused by read/write to local fs.
>
> Which machine and which directory does this "local fs" refer to? So that
> I can check myself if it is the culprit.
>
> Thanks.
> Haijun 
>
> -----Original Message-----
> From: Haijun Cao [mailto:haijun@kindsight.net] 
> Sent: Wednesday, June 04, 2008 10:44 PM
> To: pig-user@incubator.apache.org
> Subject: local bytes read/written
>
> Hi,
>
> I just started using pig, it is really fun to write pig query.
>
> I noticed in the map reduce job page, it reports bytes read/written
> from/to local file system, and the number is 2x, 3x of the bytes
> read/write to hadoop. Just want to understand the internal working of
> pig a little bit better, what operations read/write to local fs? For
> what purpose? Is it to the local fs of the data nodes? which directory?
>
> Thanks
> Haijun 
>

Re: local bytes read/written

Posted by Alan Gates <ga...@yahoo-inc.com>.

During map reduce, hadoop creates a number of temporary files.  These 
include the output of maps, and any dumps that the sort/merge algorithm 
has to do.  All these are written to local fs.  Only final outputs are 
written to hdfs.  That's why you're seeing so much more local io.

Alan.

Haijun Cao wrote:
> I am getting worried on the huge number of bytes written to local fs. I
> have a 2 machine cluster, one has 100% io util, one has 10-20% io util
> during map phase, the input data is replicated on both machines
> (replication = 2). So I suspect the extra 80-90% io on the first machine
> is caused by read/write to local fs.
>
> Which machine and which directory does this "local fs" refer to? So that
> I can check myself if it is the culprit.
>
> Thanks.
> Haijun 
>
> -----Original Message-----
> From: Haijun Cao [mailto:haijun@kindsight.net] 
> Sent: Wednesday, June 04, 2008 10:44 PM
> To: pig-user@incubator.apache.org
> Subject: local bytes read/written
>
> Hi,
>
> I just started using pig, it is really fun to write pig query.
>
> I noticed in the map reduce job page, it reports bytes read/written
> from/to local file system, and the number is 2x, 3x of the bytes
> read/write to hadoop. Just want to understand the internal working of
> pig a little bit better, what operations read/write to local fs? For
> what purpose? Is it to the local fs of the data nodes? which directory?
>
> Thanks
> Haijun 
>

RE: local bytes read/written

Posted by Haijun Cao <ha...@kindsight.net>.


I am getting worried on the huge number of bytes written to local fs. I
have a 2 machine cluster, one has 100% io util, one has 10-20% io util
during map phase, the input data is replicated on both machines
(replication = 2). So I suspect the extra 80-90% io on the first machine
is caused by read/write to local fs.

Which machine and which directory does this "local fs" refer to? So that
I can check myself if it is the culprit.

Thanks.
Haijun 

-----Original Message-----
From: Haijun Cao [mailto:haijun@kindsight.net] 
Sent: Wednesday, June 04, 2008 10:44 PM
To: pig-user@incubator.apache.org
Subject: local bytes read/written

Hi,

I just started using pig, it is really fun to write pig query.

I noticed in the map reduce job page, it reports bytes read/written
from/to local file system, and the number is 2x, 3x of the bytes
read/write to hadoop. Just want to understand the internal working of
pig a little bit better, what operations read/write to local fs? For
what purpose? Is it to the local fs of the data nodes? which directory?

Thanks
Haijun

RE: local bytes read/written

Posted by Haijun Cao <ha...@kindsight.net>.


I am getting worried on the huge number of bytes written to local fs. I
have a 2 machine cluster, one has 100% io util, one has 10-20% io util
during map phase, the input data is replicated on both machines
(replication = 2). So I suspect the extra 80-90% io on the first machine
is caused by read/write to local fs.

Which machine and which directory does this "local fs" refer to? So that
I can check myself if it is the culprit.

Thanks.
Haijun 

-----Original Message-----
From: Haijun Cao [mailto:haijun@kindsight.net] 
Sent: Wednesday, June 04, 2008 10:44 PM
To: pig-user@incubator.apache.org
Subject: local bytes read/written

Hi,

I just started using pig, it is really fun to write pig query.

I noticed in the map reduce job page, it reports bytes read/written
from/to local file system, and the number is 2x, 3x of the bytes
read/write to hadoop. Just want to understand the internal working of
pig a little bit better, what operations read/write to local fs? For
what purpose? Is it to the local fs of the data nodes? which directory?

Thanks
Haijun

local bytes read/written

Posted by Haijun Cao <ha...@kindsight.net>.

Hi,

I just started using pig, it is really fun to write pig query.

I noticed in the map reduce job page, it reports bytes read/written
from/to local file system, and the number is 2x, 3x of the bytes
read/write to hadoop. Just want to understand the internal working of
pig a little bit better, what operations read/write to local fs? For
what purpose? Is it to the local fs of the data nodes? which directory?

Thanks
Haijun

RE: PIG performance

Posted by Olga Natkovich <ol...@yahoo-inc.com>.

Do you mean that the same query ran faster with some previous code you
ran? Which code was that?

Olga 

> -----Original Message-----
> From: prashanth.rinera@gmail.com 
> [mailto:prashanth.rinera@gmail.com] On Behalf Of Prashanth Pappu
> Sent: Tuesday, June 03, 2008 5:00 PM
> To: pig-user@incubator.apache.org
> Subject: PIG performance
> 
> I just upgraded my PIG src to top of svn and see really poor 
> performance with group and cogroup queries.
> 
> Is there a recommended svn version to be used with hadoop 1.17?
> 
> Prashanth
>