You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Vihang Karajgaonkar (JIRA)" <ji...@apache.org> on 2018/05/03 17:40:00 UTC

[jira] [Commented] (HIVE-19344) Change default value of msck.repair.batch.size

    [ https://issues.apache.org/jira/browse/HIVE-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16462838#comment-16462838 ] 

Vihang Karajgaonkar commented on HIVE-19344:
--------------------------------------------

I ran some performance numbers for msck and I found the performance gains plateaus as we increase the batch size after 3000 both on S3 and HDFS. Offcourse these numbers are highly subjective of the environment but having 0 is almost always bad on large setups. On smaller setups 3000 is a reasonable batch size such that all the partitions are added together in one shot like it used to happen before the patch.

> Change default value of msck.repair.batch.size 
> -----------------------------------------------
>
>                 Key: HIVE-19344
>                 URL: https://issues.apache.org/jira/browse/HIVE-19344
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>            Priority: Minor
>         Attachments: HIVE-19344.01.patch
>
>
> {{msck.repair.batch.size}} default to 0 which means msck will try to add all the partitions in one API call to HMS. This can potentially add huge memory pressure on HMS. The default value should be changed to a reasonable number so that in case of large number of partitions we can batch the addition of partitions. Same goes for {{msck.repair.batch.max.retries}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)