You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "slim bouguerra (JIRA)" <ji...@apache.org> on 2018/02/09 04:57:00 UTC

[jira] [Comment Edited] (HIVE-16125) Split work between reducers.

    [ https://issues.apache.org/jira/browse/HIVE-16125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357920#comment-16357920 ] 

slim bouguerra edited comment on HIVE-16125 at 2/9/18 4:56 AM:
---------------------------------------------------------------

To fix this, added a new table property that the user can use as an extra hashing salt to split further the reduce sink.

For instance during the create statement use can add the property 
{code}"druid.segment.targetShardPerGranularity"="6"{code}
 to add some random keys between 0 and 5, thus per segment granualrity will have up to 6 reducer. 

FYI still unsure about the insert statements if such benefit will occur as well.  The user has to make sure when using this feature to choose wisely the target number of shards per segment granularity. If the number is to high the segments will be too small. If the number is too high the segments will be huge. Further improvement can be using statistics or add an extra shuffle reduce stage that counts and partition the rows according to some partition size. 


was (Author: bslim):
To fix this, added a new table property that the user can use as an extra hashing salt to split further the reduce sink.

For instance during the create statement use can add the property \{code}

"druid.segment.targetShardPerGranularity"="6"

{code} to add some random keys between 0 and 5, thus per segment granualrity will have up to 6 reducer. 

FYI still unsure about the insert statements if such benefit will occur as well.  The user has to make sure when using this feature to choose wisely the target number of shards per segment granularity. If the number is to high the segments will be too small. If the number is too high the segments will be huge. Further improvement can be using statistics or add an extra shuffle reduce stage that counts and partition the rows according to some partition size. 

> Split work between reducers.
> ----------------------------
>
>                 Key: HIVE-16125
>                 URL: https://issues.apache.org/jira/browse/HIVE-16125
>             Project: Hive
>          Issue Type: Bug
>          Components: Druid integration
>            Reporter: slim bouguerra
>            Assignee: slim bouguerra
>            Priority: Major
>         Attachments: HIVE-16125.patch
>
>
> Split work between reducer.
> currently we have one reducer per segment granularity even if the interval will be partitioned over multiple partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)