You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Sergey (JIRA)" <ji...@apache.org> on 2013/08/02 12:51:48 UTC

[jira] [Commented] (PIG-3409) org.apache.pig.data.DefaultTuple hashcode perfomance issue

    [ https://issues.apache.org/jira/browse/PIG-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13727559#comment-13727559 ] 

Sergey commented on PIG-3409:
-----------------------------

http://bigdatapath.com/wp-content/uploads/2013/08/hash_code_perfomance_issue.png

Here is visual VM running in local mode.
I'm joining 100 mb of data and 100 mb of data using replicated join by 4 int fields.

Cluster-mode on 18 reducers, 32 cores, -Xmx=3072Mb for the task takes ~30 min to join 6Gb of data (6Gb/18 per task) with 100Mb of data by four fields.

                
> org.apache.pig.data.DefaultTuple hashcode perfomance issue
> ----------------------------------------------------------
>
>                 Key: PIG-3409
>                 URL: https://issues.apache.org/jira/browse/PIG-3409
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.11
>            Reporter: Sergey
>            Priority: Critical
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> I've met serious perfomance issue.
> please see visualvm screenshot.
> Here is hashCode implementation from the class:
> {code}
>  @Override
>     public int hashCode() {
>         int hash = 17;
>         for (Iterator<Object> it = mFields.iterator(); it.hasNext();) {
>             Object o = it.next();
>             if (o != null) {
>                 hash = 31 * hash + o.hashCode();
>             }
>         }
>         return hash;
>     }
> {code}
> I don't see any reason here to iterate over the whole tuple, aggregate hash value and then return it.
> I can fix it, if it's possible to take part in dev process. I'm new to it :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira