You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by Aniket Mokashi <an...@gmail.com> on 2013/02/20 22:18:57 UTC

Replicated join: is there a setting to make this better?

Hi devs,

I was looking into limitations of size/records for fragment replicated join
(map join) in pig. To test that I loaded a map (aka fragment) of longs in
an alias to join it with other alias which had few other columns. With a
map file of 50mb I saw GC Overheads on the mappers. I took a heap dump of
mapper to look into whats causing the GC Overheads and found that its the
memory footprint of fragment itself was high.

[image: Inline image 1]

Note, the hashmap was able to only load about 1.8 million records-
[image: Inline image 2]
Reason was that every map record has an overhead of about 1.5kb. Most of it
is part of retained heap, but it needs to be garbage collected.
[image: Inline image 3]

So, it turns out-

Size of heap required by a map join from above = 1.5 KB * Number of records
+ Size of input (uncompressed databytearray)... (assuming the key is a
long).

So, to run your replicated join, you need to satisfy following criteria:

*1.5 KB * Number of records + Size of input (uncompressed) < estimated free
memory in the mapper (total heap - io.sort.mb - some minor constant etc.)*

Is that a right conclusion? Is there a setting/way to make this better?

Thanks,

Aniket

*
*

Re: Replicated join: is there a setting to make this better?

Posted by Jonathan Coveney <jc...@gmail.com>.

One quick way to vastly improve the memory efficiency is to utilize the
SchemaTuple addition.

https://issues.apache.org/jira/browse/PIG-2359

This should cut memory use in half, at least.


2013/2/22 Aniket Mokashi <an...@gmail.com>

> Interesting, I found this in 0.11 documentation:
>
> Fragment replicate joins are experimental; we don't have a strong sense of
> how small the small relation must be to fit into memory. In our tests with
> a simple query that involves just a JOIN, a relation of up to 100 M can be
> used if the process overall gets 1 GB of memory. Please share your
> observations and experience with us.
>
> Let me open a jira to share some of the experience I have with this or do
> we already have one?
>
> ~Aniket
>
>
> On Thu, Feb 21, 2013 at 7:07 PM, Prashant Kommireddi <prash1784@gmail.com
> >wrote:
>
> > Mailing lists don't support attachments. Is JIRA a place we can discuss
> > this? Based on the outcome we could either classify it an improvement/bug
> > or "Not a Problem" ?
> >
> > -Prashant
> >
> > On Thu, Feb 21, 2013 at 7:02 PM, Aniket Mokashi <an...@gmail.com>
> > wrote:
> >
> > > Thanks Johnny. I am not sure how to post these images on mailing lists!
> > :(
> > >
> > >
> > > On Thu, Feb 21, 2013 at 6:30 PM, Johnny Zhang <xi...@cloudera.com>
> > > wrote:
> > >
> > > > Hi, Aniket:
> > > > your image is blank :) not sure if this only happens to me though.
> > > >
> > > > Johnny
> > > >
> > > >
> > > > On Thu, Feb 21, 2013 at 6:08 PM, Aniket Mokashi <aniket486@gmail.com
> >
> > > > wrote:
> > > >
> > > > > I think the email was filtered out. Resending.
> > > > >
> > > > >
> > > > > ---------- Forwarded message ----------
> > > > > From: Aniket Mokashi <an...@gmail.com>
> > > > > Date: Wed, Feb 20, 2013 at 1:18 PM
> > > > > Subject: Replicated join: is there a setting to make this better?
> > > > > To: "dev@pig.apache.org" <de...@pig.apache.org>
> > > > >
> > > > >
> > > > > Hi devs,
> > > > >
> > > > > I was looking into limitations of size/records for fragment
> > replicated
> > > > > join (map join) in pig. To test that I loaded a map (aka fragment)
> of
> > > > longs
> > > > > in an alias to join it with other alias which had few other
> columns.
> > > > With a
> > > > > map file of 50mb I saw GC Overheads on the mappers. I took a heap
> > dump
> > > of
> > > > > mapper to look into whats causing the GC Overheads and found that
> its
> > > the
> > > > > memory footprint of fragment itself was high.
> > > > >
> > > > > [image: Inline image 1]
> > > > >
> > > > > Note, the hashmap was able to only load about 1.8 million records-
> > > > > [image: Inline image 2]
> > > > > Reason was that every map record has an overhead of about 1.5kb.
> Most
> > > of
> > > > > it is part of retained heap, but it needs to be garbage collected.
> > > > > [image: Inline image 3]
> > > > >
> > > > > So, it turns out-
> > > > >
> > > > > Size of heap required by a map join from above = 1.5 KB * Number of
> > > > > records + Size of input (uncompressed databytearray)... (assuming
> the
> > > key
> > > > > is a long).
> > > > >
> > > > > So, to run your replicated join, you need to satisfy following
> > > criteria:
> > > > >
> > > > > *1.5 KB * Number of records + Size of input (uncompressed) <
> > estimated
> > > > > free memory in the mapper (total heap - io.sort.mb - some minor
> > > constant
> > > > > etc.)*
> > > > >
> > > > > Is that a right conclusion? Is there a setting/way to make this
> > better?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Aniket
> > > > >
> > > > > *
> > > > > *
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > "...:::Aniket:::... Quetzalco@tl"
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > "...:::Aniket:::... Quetzalco@tl"
> > >
> >
>
>
>
> --
> "...:::Aniket:::... Quetzalco@tl"
>

Re: Replicated join: is there a setting to make this better?

Posted by Aniket Mokashi <an...@gmail.com>.

Interesting, I found this in 0.11 documentation:

Fragment replicate joins are experimental; we don't have a strong sense of
how small the small relation must be to fit into memory. In our tests with
a simple query that involves just a JOIN, a relation of up to 100 M can be
used if the process overall gets 1 GB of memory. Please share your
observations and experience with us.

Let me open a jira to share some of the experience I have with this or do
we already have one?

~Aniket


On Thu, Feb 21, 2013 at 7:07 PM, Prashant Kommireddi <pr...@gmail.com>wrote:

> Mailing lists don't support attachments. Is JIRA a place we can discuss
> this? Based on the outcome we could either classify it an improvement/bug
> or "Not a Problem" ?
>
> -Prashant
>
> On Thu, Feb 21, 2013 at 7:02 PM, Aniket Mokashi <an...@gmail.com>
> wrote:
>
> > Thanks Johnny. I am not sure how to post these images on mailing lists!
> :(
> >
> >
> > On Thu, Feb 21, 2013 at 6:30 PM, Johnny Zhang <xi...@cloudera.com>
> > wrote:
> >
> > > Hi, Aniket:
> > > your image is blank :) not sure if this only happens to me though.
> > >
> > > Johnny
> > >
> > >
> > > On Thu, Feb 21, 2013 at 6:08 PM, Aniket Mokashi <an...@gmail.com>
> > > wrote:
> > >
> > > > I think the email was filtered out. Resending.
> > > >
> > > >
> > > > ---------- Forwarded message ----------
> > > > From: Aniket Mokashi <an...@gmail.com>
> > > > Date: Wed, Feb 20, 2013 at 1:18 PM
> > > > Subject: Replicated join: is there a setting to make this better?
> > > > To: "dev@pig.apache.org" <de...@pig.apache.org>
> > > >
> > > >
> > > > Hi devs,
> > > >
> > > > I was looking into limitations of size/records for fragment
> replicated
> > > > join (map join) in pig. To test that I loaded a map (aka fragment) of
> > > longs
> > > > in an alias to join it with other alias which had few other columns.
> > > With a
> > > > map file of 50mb I saw GC Overheads on the mappers. I took a heap
> dump
> > of
> > > > mapper to look into whats causing the GC Overheads and found that its
> > the
> > > > memory footprint of fragment itself was high.
> > > >
> > > > [image: Inline image 1]
> > > >
> > > > Note, the hashmap was able to only load about 1.8 million records-
> > > > [image: Inline image 2]
> > > > Reason was that every map record has an overhead of about 1.5kb. Most
> > of
> > > > it is part of retained heap, but it needs to be garbage collected.
> > > > [image: Inline image 3]
> > > >
> > > > So, it turns out-
> > > >
> > > > Size of heap required by a map join from above = 1.5 KB * Number of
> > > > records + Size of input (uncompressed databytearray)... (assuming the
> > key
> > > > is a long).
> > > >
> > > > So, to run your replicated join, you need to satisfy following
> > criteria:
> > > >
> > > > *1.5 KB * Number of records + Size of input (uncompressed) <
> estimated
> > > > free memory in the mapper (total heap - io.sort.mb - some minor
> > constant
> > > > etc.)*
> > > >
> > > > Is that a right conclusion? Is there a setting/way to make this
> better?
> > > >
> > > > Thanks,
> > > >
> > > > Aniket
> > > >
> > > > *
> > > > *
> > > >
> > > >
> > > >
> > > > --
> > > > "...:::Aniket:::... Quetzalco@tl"
> > > >
> > >
> >
> >
> >
> > --
> > "...:::Aniket:::... Quetzalco@tl"
> >
>



-- 
"...:::Aniket:::... Quetzalco@tl"

Re: Replicated join: is there a setting to make this better?

Posted by Prashant Kommireddi <pr...@gmail.com>.

Mailing lists don't support attachments. Is JIRA a place we can discuss
this? Based on the outcome we could either classify it an improvement/bug
or "Not a Problem" ?

-Prashant

On Thu, Feb 21, 2013 at 7:02 PM, Aniket Mokashi <an...@gmail.com> wrote:

> Thanks Johnny. I am not sure how to post these images on mailing lists! :(
>
>
> On Thu, Feb 21, 2013 at 6:30 PM, Johnny Zhang <xi...@cloudera.com>
> wrote:
>
> > Hi, Aniket:
> > your image is blank :) not sure if this only happens to me though.
> >
> > Johnny
> >
> >
> > On Thu, Feb 21, 2013 at 6:08 PM, Aniket Mokashi <an...@gmail.com>
> > wrote:
> >
> > > I think the email was filtered out. Resending.
> > >
> > >
> > > ---------- Forwarded message ----------
> > > From: Aniket Mokashi <an...@gmail.com>
> > > Date: Wed, Feb 20, 2013 at 1:18 PM
> > > Subject: Replicated join: is there a setting to make this better?
> > > To: "dev@pig.apache.org" <de...@pig.apache.org>
> > >
> > >
> > > Hi devs,
> > >
> > > I was looking into limitations of size/records for fragment replicated
> > > join (map join) in pig. To test that I loaded a map (aka fragment) of
> > longs
> > > in an alias to join it with other alias which had few other columns.
> > With a
> > > map file of 50mb I saw GC Overheads on the mappers. I took a heap dump
> of
> > > mapper to look into whats causing the GC Overheads and found that its
> the
> > > memory footprint of fragment itself was high.
> > >
> > > [image: Inline image 1]
> > >
> > > Note, the hashmap was able to only load about 1.8 million records-
> > > [image: Inline image 2]
> > > Reason was that every map record has an overhead of about 1.5kb. Most
> of
> > > it is part of retained heap, but it needs to be garbage collected.
> > > [image: Inline image 3]
> > >
> > > So, it turns out-
> > >
> > > Size of heap required by a map join from above = 1.5 KB * Number of
> > > records + Size of input (uncompressed databytearray)... (assuming the
> key
> > > is a long).
> > >
> > > So, to run your replicated join, you need to satisfy following
> criteria:
> > >
> > > *1.5 KB * Number of records + Size of input (uncompressed) < estimated
> > > free memory in the mapper (total heap - io.sort.mb - some minor
> constant
> > > etc.)*
> > >
> > > Is that a right conclusion? Is there a setting/way to make this better?
> > >
> > > Thanks,
> > >
> > > Aniket
> > >
> > > *
> > > *
> > >
> > >
> > >
> > > --
> > > "...:::Aniket:::... Quetzalco@tl"
> > >
> >
>
>
>
> --
> "...:::Aniket:::... Quetzalco@tl"
>

Re: Replicated join: is there a setting to make this better?

Posted by Aniket Mokashi <an...@gmail.com>.

Thanks Johnny. I am not sure how to post these images on mailing lists! :(


On Thu, Feb 21, 2013 at 6:30 PM, Johnny Zhang <xi...@cloudera.com> wrote:

> Hi, Aniket:
> your image is blank :) not sure if this only happens to me though.
>
> Johnny
>
>
> On Thu, Feb 21, 2013 at 6:08 PM, Aniket Mokashi <an...@gmail.com>
> wrote:
>
> > I think the email was filtered out. Resending.
> >
> >
> > ---------- Forwarded message ----------
> > From: Aniket Mokashi <an...@gmail.com>
> > Date: Wed, Feb 20, 2013 at 1:18 PM
> > Subject: Replicated join: is there a setting to make this better?
> > To: "dev@pig.apache.org" <de...@pig.apache.org>
> >
> >
> > Hi devs,
> >
> > I was looking into limitations of size/records for fragment replicated
> > join (map join) in pig. To test that I loaded a map (aka fragment) of
> longs
> > in an alias to join it with other alias which had few other columns.
> With a
> > map file of 50mb I saw GC Overheads on the mappers. I took a heap dump of
> > mapper to look into whats causing the GC Overheads and found that its the
> > memory footprint of fragment itself was high.
> >
> > [image: Inline image 1]
> >
> > Note, the hashmap was able to only load about 1.8 million records-
> > [image: Inline image 2]
> > Reason was that every map record has an overhead of about 1.5kb. Most of
> > it is part of retained heap, but it needs to be garbage collected.
> > [image: Inline image 3]
> >
> > So, it turns out-
> >
> > Size of heap required by a map join from above = 1.5 KB * Number of
> > records + Size of input (uncompressed databytearray)... (assuming the key
> > is a long).
> >
> > So, to run your replicated join, you need to satisfy following criteria:
> >
> > *1.5 KB * Number of records + Size of input (uncompressed) < estimated
> > free memory in the mapper (total heap - io.sort.mb - some minor constant
> > etc.)*
> >
> > Is that a right conclusion? Is there a setting/way to make this better?
> >
> > Thanks,
> >
> > Aniket
> >
> > *
> > *
> >
> >
> >
> > --
> > "...:::Aniket:::... Quetzalco@tl"
> >
>



-- 
"...:::Aniket:::... Quetzalco@tl"

Re: Replicated join: is there a setting to make this better?

Posted by Johnny Zhang <xi...@cloudera.com>.

Hi, Aniket:
your image is blank :) not sure if this only happens to me though.

Johnny


On Thu, Feb 21, 2013 at 6:08 PM, Aniket Mokashi <an...@gmail.com> wrote:

> I think the email was filtered out. Resending.
>
>
> ---------- Forwarded message ----------
> From: Aniket Mokashi <an...@gmail.com>
> Date: Wed, Feb 20, 2013 at 1:18 PM
> Subject: Replicated join: is there a setting to make this better?
> To: "dev@pig.apache.org" <de...@pig.apache.org>
>
>
> Hi devs,
>
> I was looking into limitations of size/records for fragment replicated
> join (map join) in pig. To test that I loaded a map (aka fragment) of longs
> in an alias to join it with other alias which had few other columns. With a
> map file of 50mb I saw GC Overheads on the mappers. I took a heap dump of
> mapper to look into whats causing the GC Overheads and found that its the
> memory footprint of fragment itself was high.
>
> [image: Inline image 1]
>
> Note, the hashmap was able to only load about 1.8 million records-
> [image: Inline image 2]
> Reason was that every map record has an overhead of about 1.5kb. Most of
> it is part of retained heap, but it needs to be garbage collected.
> [image: Inline image 3]
>
> So, it turns out-
>
> Size of heap required by a map join from above = 1.5 KB * Number of
> records + Size of input (uncompressed databytearray)... (assuming the key
> is a long).
>
> So, to run your replicated join, you need to satisfy following criteria:
>
> *1.5 KB * Number of records + Size of input (uncompressed) < estimated
> free memory in the mapper (total heap - io.sort.mb - some minor constant
> etc.)*
>
> Is that a right conclusion? Is there a setting/way to make this better?
>
> Thanks,
>
> Aniket
>
> *
> *
>
>
>
> --
> "...:::Aniket:::... Quetzalco@tl"
>

Fwd: Replicated join: is there a setting to make this better?

Posted by Aniket Mokashi <an...@gmail.com>.

I think the email was filtered out. Resending.

---------- Forwarded message ----------
From: Aniket Mokashi <an...@gmail.com>
Date: Wed, Feb 20, 2013 at 1:18 PM
Subject: Replicated join: is there a setting to make this better?
To: "dev@pig.apache.org" <de...@pig.apache.org>

Hi devs,

I was looking into limitations of size/records for fragment replicated join
(map join) in pig. To test that I loaded a map (aka fragment) of longs in
an alias to join it with other alias which had few other columns. With a
map file of 50mb I saw GC Overheads on the mappers. I took a heap dump of
mapper to look into whats causing the GC Overheads and found that its the
memory footprint of fragment itself was high.

[image: Inline image 1]

Note, the hashmap was able to only load about 1.8 million records-
[image: Inline image 2]
Reason was that every map record has an overhead of about 1.5kb. Most of it
is part of retained heap, but it needs to be garbage collected.
[image: Inline image 3]

So, it turns out-

Size of heap required by a map join from above = 1.5 KB * Number of records
+ Size of input (uncompressed databytearray)... (assuming the key is a
long).

So, to run your replicated join, you need to satisfy following criteria:

*1.5 KB * Number of records + Size of input (uncompressed) < estimated free
memory in the mapper (total heap - io.sort.mb - some minor constant etc.)*

Is that a right conclusion? Is there a setting/way to make this better?

Thanks,

Aniket

*
*

-- 
"...:::Aniket:::... Quetzalco@tl"

Fwd: Replicated join: is there a setting to make this better?

Posted by Aniket Mokashi <an...@gmail.com>.

I think the email was filtered out. Resending.

---------- Forwarded message ----------
From: Aniket Mokashi <an...@gmail.com>
Date: Wed, Feb 20, 2013 at 1:18 PM
Subject: Replicated join: is there a setting to make this better?
To: "dev@pig.apache.org" <de...@pig.apache.org>

Hi devs,

I was looking into limitations of size/records for fragment replicated join
(map join) in pig. To test that I loaded a map (aka fragment) of longs in
an alias to join it with other alias which had few other columns. With a
map file of 50mb I saw GC Overheads on the mappers. I took a heap dump of
mapper to look into whats causing the GC Overheads and found that its the
memory footprint of fragment itself was high.

[image: Inline image 1]

Note, the hashmap was able to only load about 1.8 million records-
[image: Inline image 2]
Reason was that every map record has an overhead of about 1.5kb. Most of it
is part of retained heap, but it needs to be garbage collected.
[image: Inline image 3]

So, it turns out-

Size of heap required by a map join from above = 1.5 KB * Number of records
+ Size of input (uncompressed databytearray)... (assuming the key is a
long).

So, to run your replicated join, you need to satisfy following criteria:

*1.5 KB * Number of records + Size of input (uncompressed) < estimated free
memory in the mapper (total heap - io.sort.mb - some minor constant etc.)*

Is that a right conclusion? Is there a setting/way to make this better?

Thanks,

Aniket

*
*

-- 
"...:::Aniket:::... Quetzalco@tl"