You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Alan Gates (JIRA)" <ji...@apache.org> on 2009/10/26 22:20:59 UTC

[jira] Commented: (PIG-1037) better memory layout and spill for sorted and distinct bags

    [ https://issues.apache.org/jira/browse/PIG-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770206#action_12770206 ] 

Alan Gates commented on PIG-1037:
---------------------------------

Comments:

In InternalSortedBag.add, you are calculating the average size every time you add a tuple for the first 100 tuples.  Rather than do the calculations every time, wouldn't it be better wait until you get to 100 tuples then calculate the average?  This would miss the case where you can store less than 100 tuples, but that seems unlikely.

Some of the comments in InternalSortedBag that were copied over from the previous code, such as dealing with spills in the midst of reading, are no longer true.  They should be removed since they will cause confusion on how the code works.

I think the synchronized blocks in InternalSortedBag can be removed.  They were there before because spills could be triggered by a separate thread.  Since that is no longer true we should be able to remove these.  This will remove a lock/unlock on every read of a record out of the bag and should provide some speed up.



> better memory layout and spill for sorted and distinct bags
> -----------------------------------------------------------
>
>                 Key: PIG-1037
>                 URL: https://issues.apache.org/jira/browse/PIG-1037
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>            Assignee: Ying He
>         Attachments: PIG-1037.patch, PIG-1037.patch2
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.