You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Sean Barry <sb...@cricketcommunications.com> on 2011/08/02 20:43:18 UTC

Proactive Spill Count Recs

org.apache.pig.PigCounters

PROACTIVE_SPILL_COUNT_RECS

0

2,372,598

2,372,598

SPILLABLE_MEMORY_MANAGER_SPILL_COUNT

0

64

64

PROACTIVE_SPILL_COUNT_BAGS



I was checking my jobtracker and I have no idea what these three counters are representative of...
Can anyone shed some light, please?

-SB

Re: Proactive Spill Count Recs

Posted by Thejas Nair <th...@hortonworks.com>.
There are two ways pig controls the memory used by large bags -
1. Triggers set on GC, similar to mechanism described by Julien here - 
https://techblug.wordpress.com/2011/07/21/detecting-low-memory-in-java-part-2/ 
. When pig gets notified about high memory usage, it goes through the 
list of Spillable bags that have registered with SpillableMemoryManager 
and calls .spill() on the large bags.
SPILLABLE_MEMORY_MANAGER_SPILL_COUNT is the number of spillable bags 
that have been spilled to disk.


2. Above method was not found to be very reliable, sometimes the 
notification of memory usage was too late. So self spilling 
InternalCachedBags were introduced. They start spilling to disk when the 
memory (estimated to be) used by them is above a configurable threshold. 
Bags of this type are used in almost all cases where pig operations are 
likely to produce large bags, such as in (co)group, distinct on a bag 
column etc.
There are two counters for this type of bag -
PROACTIVE_SPILL_COUNT_BAGS - number of internal cached bag that spilled
PROACTIVE_SPILL_COUNT_RECS - number of records in internal cached bag 
that got written out in the spill.
The config param for this is mentioned here - 
http://pig.apache.org/docs/r0.9.0/perf.html#memory-management


So if you see these counters, it means that some of the bags that pig 
dealt with where large and had to be spilled to disk. Allocating more 
for pig tasks might help (-Xmx ). I will create a jira so that I 
remember to write some documentation for this.

-Thejas




On 8/3/11 10:31 AM, Dmitriy Ryaboy wrote:
> Daniel,
> iirc spill requests are triggered by a gc, and spill_count is triggered by
> an actual spill, so the former number may be a bit misleading (if gc is
> effective, lots of gcs might be fine).
>
> D
>
> On Wed, Aug 3, 2011 at 10:12 AM, Daniel Dai<da...@hortonworks.com>  wrote:
>
>> Spill means Pig need to dump memory into disk. It happens when Pig
>> deals with a large key, and Pig run short of memory. The high number
>> indicates Pig need to write to disk frequently and performance may
>> downgrade, and you may explore approach, such as using skewed join.
>>
>> Daniel
>>
>> On Tue, Aug 2, 2011 at 11:43 AM, Sean Barry
>> <sb...@cricketcommunications.com>  wrote:
>>> org.apache.pig.PigCounters
>>>
>>> PROACTIVE_SPILL_COUNT_RECS
>>>
>>> 0
>>>
>>> 2,372,598
>>>
>>> 2,372,598
>>>
>>> SPILLABLE_MEMORY_MANAGER_SPILL_COUNT
>>>
>>> 0
>>>
>>> 64
>>>
>>> 64
>>>
>>> PROACTIVE_SPILL_COUNT_BAGS
>>>
>>>
>>>
>>> I was checking my jobtracker and I have no idea what these three counters
>> are representative of...
>>> Can anyone shed some light, please?
>>>
>>> -SB
>>>
>>
>


Re: Proactive Spill Count Recs

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Daniel,
iirc spill requests are triggered by a gc, and spill_count is triggered by
an actual spill, so the former number may be a bit misleading (if gc is
effective, lots of gcs might be fine).

D

On Wed, Aug 3, 2011 at 10:12 AM, Daniel Dai <da...@hortonworks.com> wrote:

> Spill means Pig need to dump memory into disk. It happens when Pig
> deals with a large key, and Pig run short of memory. The high number
> indicates Pig need to write to disk frequently and performance may
> downgrade, and you may explore approach, such as using skewed join.
>
> Daniel
>
> On Tue, Aug 2, 2011 at 11:43 AM, Sean Barry
> <sb...@cricketcommunications.com> wrote:
> > org.apache.pig.PigCounters
> >
> > PROACTIVE_SPILL_COUNT_RECS
> >
> > 0
> >
> > 2,372,598
> >
> > 2,372,598
> >
> > SPILLABLE_MEMORY_MANAGER_SPILL_COUNT
> >
> > 0
> >
> > 64
> >
> > 64
> >
> > PROACTIVE_SPILL_COUNT_BAGS
> >
> >
> >
> > I was checking my jobtracker and I have no idea what these three counters
> are representative of...
> > Can anyone shed some light, please?
> >
> > -SB
> >
>

Re: Proactive Spill Count Recs

Posted by Daniel Dai <da...@hortonworks.com>.
Spill means Pig need to dump memory into disk. It happens when Pig
deals with a large key, and Pig run short of memory. The high number
indicates Pig need to write to disk frequently and performance may
downgrade, and you may explore approach, such as using skewed join.

Daniel

On Tue, Aug 2, 2011 at 11:43 AM, Sean Barry
<sb...@cricketcommunications.com> wrote:
> org.apache.pig.PigCounters
>
> PROACTIVE_SPILL_COUNT_RECS
>
> 0
>
> 2,372,598
>
> 2,372,598
>
> SPILLABLE_MEMORY_MANAGER_SPILL_COUNT
>
> 0
>
> 64
>
> 64
>
> PROACTIVE_SPILL_COUNT_BAGS
>
>
>
> I was checking my jobtracker and I have no idea what these three counters are representative of...
> Can anyone shed some light, please?
>
> -SB
>