You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2014/02/09 03:38:19 UTC

[jira] [Commented] (PIG-3456) Reduce threadlocal conf access in backend for each record

    [ https://issues.apache.org/jira/browse/PIG-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895809#comment-13895809 ] 

Rohini Palaniswamy commented on PIG-3456:
-----------------------------------------

Posted in review board as well - https://reviews.apache.org/r/17876/. 

[~dvryaboy],
    PIG-3730 has some analysis and stack traces on how threadlocal access of Configuration for each record processing affects performance. There are already lots of places in pig code where this was already done. I just fixed places that it was not done. I had this patch done for 0.11 earlier when Pigmix numbers for 0.11 where worse than 0.10 even when I had  PIG-2923 reverted. I could not figure out the exact reason, but since I was short of time tried fixing things I came across like this one to make it equal to 0.10. The thread local fix did not actually make much difference with my Pigmix test actually, but removing the extra boolean in DefaultTuple did add around 3-5%. But according to PIG-3730, it makes a difference when there is lot of GC happening. 

> Reduce threadlocal conf access in backend for each record
> ---------------------------------------------------------
>
>                 Key: PIG-3456
>                 URL: https://issues.apache.org/jira/browse/PIG-3456
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.11.1
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.13.0
>
>         Attachments: PIG-3456-1-no-whitespace.patch, PIG-3456-1.patch
>
>
> Noticed few things while browsing code
> 1) DefaultTuple has a protected boolean isNull = false; which is never used. Removing this gives ~3-5% improvement for big jobs
> 2) Config checking with ThreadLocal conf is repeatedly done for each record. For eg: createDataBag in POCombinerPackage. But initialized only for first time in other places like POPackage, POJoinPackage, etc.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)