You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2019/07/05 18:40:00 UTC

[jira] [Commented] (PIG-5390) Avoid adding self-spilling bags to SpillableMemoryManager

    [ https://issues.apache.org/jira/browse/PIG-5390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879477#comment-16879477 ] 

Rohini Palaniswamy commented on PIG-5390:
-----------------------------------------

bq. Question here would be, shall we stop adding InternalSortedBag and InternalDistinctBag to SpillableMemoryManager
  No we cannot take out that spill. There could be multiple bags, user udfs or multiple input and output sort buffers for a vertex in Tez causing memory pressure and we will have to spill to avoid OOM. proactive spill will not kick in that case.  We need to have proactive_spill as well as we don’t want it to grow too much beyond it's memory limits and end up causing up full spill of all bags.

 We should just remove the misleading comment that these bags don't spill via SpillableMemoryManager. 

> Avoid adding self-spilling bags to SpillableMemoryManager 
> ----------------------------------------------------------
>
>                 Key: PIG-5390
>                 URL: https://issues.apache.org/jira/browse/PIG-5390
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Minor
>
> This is a follow up from PIG-5380 where [~rohini] pointed out 
> {quote}
> I think same change is required in InternalSortedBag as well as code is exactly same and it can spill too - https://github.com/apache/pig/blob/trunk/src/org/apache/pig/data/InternalSortedBag.java#L133 . We most likely haven't seen issues with it as the probability could be very less as it will proactively spill if it exceeds cached memory limit.
> {quote}
> Looking at the history and the source, this is a critical bug given all these self-spilling bags are designed on the premise that no other threads would touch them.  Comment in the source clearly say
> {code}
>  * This bag is not registered with SpillableMemoryManager. It calculates
>  * the number of tuples to hold in memory and spill pro-actively into files."
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)