You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ning Zhang (JIRA)" <ji...@apache.org> on 2009/11/23 21:41:40 UTC
[jira] Created: (HIVE-949) Object deepCopy in GroupBy Operator
Object deepCopy in GroupBy Operator
-----------------------------------
Key: HIVE-949
URL: https://issues.apache.org/jira/browse/HIVE-949
Project: Hadoop Hive
Issue Type: Improvement
Reporter: Ning Zhang
In GroupByOperator, objects are first deep copied and then check whether or not the object is in the hash table (in hash-mode aggregation). In fact, object deep copy could be very expensive (around 5% CPU time). A simple change could be generate the object without deep copy through ObjectInspector and check its existence in the hash table. If not exists, we call deep copy.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-949) Object deepCopy in GroupBy Operator
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Namit Jain resolved HIVE-949.
-----------------------------
Resolution: Fixed
Fix Version/s: 0.5.0
Hadoop Flags: [Reviewed]
Committed. Thanks yongqiang
> Object deepCopy in GroupBy Operator
> -----------------------------------
>
> Key: HIVE-949
> URL: https://issues.apache.org/jira/browse/HIVE-949
> Project: Hadoop Hive
> Issue Type: Improvement
> Reporter: Ning Zhang
> Assignee: He Yongqiang
> Fix For: 0.5.0
>
> Attachments: hive-949-2009-11-26.patch
>
>
> In GroupByOperator, objects are first deep copied and then check whether or not the object is in the hash table (in hash-mode aggregation). In fact, object deep copy could be very expensive (around 5% CPU time). A simple change could be generate the object without deep copy through ObjectInspector and check its existence in the hash table. If not exists, we call deep copy.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-949) Object deepCopy in GroupBy Operator
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783041#action_12783041 ]
Namit Jain commented on HIVE-949:
---------------------------------
looks good
+1
will commit if the tests pass
> Object deepCopy in GroupBy Operator
> -----------------------------------
>
> Key: HIVE-949
> URL: https://issues.apache.org/jira/browse/HIVE-949
> Project: Hadoop Hive
> Issue Type: Improvement
> Reporter: Ning Zhang
> Assignee: He Yongqiang
> Attachments: hive-949-2009-11-26.patch
>
>
> In GroupByOperator, objects are first deep copied and then check whether or not the object is in the hash table (in hash-mode aggregation). In fact, object deep copy could be very expensive (around 5% CPU time). A simple change could be generate the object without deep copy through ObjectInspector and check its existence in the hash table. If not exists, we call deep copy.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-949) Object deepCopy in GroupBy Operator
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781750#action_12781750 ]
He Yongqiang commented on HIVE-949:
-----------------------------------
no problem.
> Object deepCopy in GroupBy Operator
> -----------------------------------
>
> Key: HIVE-949
> URL: https://issues.apache.org/jira/browse/HIVE-949
> Project: Hadoop Hive
> Issue Type: Improvement
> Reporter: Ning Zhang
> Assignee: He Yongqiang
>
> In GroupByOperator, objects are first deep copied and then check whether or not the object is in the hash table (in hash-mode aggregation). In fact, object deep copy could be very expensive (around 5% CPU time). A simple change could be generate the object without deep copy through ObjectInspector and check its existence in the hash table. If not exists, we call deep copy.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-949) Object deepCopy in GroupBy Operator
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HIVE-949:
------------------------------
Attachment: hive-949-2009-11-26.patch
This saves about 45% CPU time of GroupByOperator.processHashAggr. (393,686 ms -> 216,860 ms).
> Object deepCopy in GroupBy Operator
> -----------------------------------
>
> Key: HIVE-949
> URL: https://issues.apache.org/jira/browse/HIVE-949
> Project: Hadoop Hive
> Issue Type: Improvement
> Reporter: Ning Zhang
> Assignee: He Yongqiang
> Attachments: hive-949-2009-11-26.patch
>
>
> In GroupByOperator, objects are first deep copied and then check whether or not the object is in the hash table (in hash-mode aggregation). In fact, object deep copy could be very expensive (around 5% CPU time). A simple change could be generate the object without deep copy through ObjectInspector and check its existence in the hash table. If not exists, we call deep copy.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-949) Object deepCopy in GroupBy Operator
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781752#action_12781752 ]
Namit Jain commented on HIVE-949:
---------------------------------
Also, get some performance numbers for this
> Object deepCopy in GroupBy Operator
> -----------------------------------
>
> Key: HIVE-949
> URL: https://issues.apache.org/jira/browse/HIVE-949
> Project: Hadoop Hive
> Issue Type: Improvement
> Reporter: Ning Zhang
> Assignee: He Yongqiang
>
> In GroupByOperator, objects are first deep copied and then check whether or not the object is in the hash table (in hash-mode aggregation). In fact, object deep copy could be very expensive (around 5% CPU time). A simple change could be generate the object without deep copy through ObjectInspector and check its existence in the hash table. If not exists, we call deep copy.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-949) Object deepCopy in GroupBy Operator
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783198#action_12783198 ]
Zheng Shao commented on HIVE-949:
---------------------------------
This transaction might increase the amount of memory and objects created per distinct key.
HIVE-535 has a proposal that should save both CPU time (mainly object creation time), and memory space.
> Object deepCopy in GroupBy Operator
> -----------------------------------
>
> Key: HIVE-949
> URL: https://issues.apache.org/jira/browse/HIVE-949
> Project: Hadoop Hive
> Issue Type: Improvement
> Reporter: Ning Zhang
> Assignee: He Yongqiang
> Fix For: 0.5.0
>
> Attachments: hive-949-2009-11-26.patch
>
>
> In GroupByOperator, objects are first deep copied and then check whether or not the object is in the hash table (in hash-mode aggregation). In fact, object deep copy could be very expensive (around 5% CPU time). A simple change could be generate the object without deep copy through ObjectInspector and check its existence in the hash table. If not exists, we call deep copy.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-949) Object deepCopy in GroupBy Operator
Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ning Zhang reassigned HIVE-949:
-------------------------------
Assignee: He Yongqiang
Yongqiang, could you please take a look at this as we discussed?
> Object deepCopy in GroupBy Operator
> -----------------------------------
>
> Key: HIVE-949
> URL: https://issues.apache.org/jira/browse/HIVE-949
> Project: Hadoop Hive
> Issue Type: Improvement
> Reporter: Ning Zhang
> Assignee: He Yongqiang
>
> In GroupByOperator, objects are first deep copied and then check whether or not the object is in the hash table (in hash-mode aggregation). In fact, object deep copy could be very expensive (around 5% CPU time). A simple change could be generate the object without deep copy through ObjectInspector and check its existence in the hash table. If not exists, we call deep copy.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.