You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "Vihang Karajgaonkar (JIRA)" <ji...@apache.org> on 2017/02/22 22:53:44 UTC

[jira] [Assigned] (HIVE-16014) HiveMetastoreChecker should use hive.metastore.fshandler.threads instead of hive.mv.files.thread for pool size

     [ https://issues.apache.org/jira/browse/HIVE-16014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vihang Karajgaonkar reassigned HIVE-16014:
------------------------------------------


> HiveMetastoreChecker should use hive.metastore.fshandler.threads instead of hive.mv.files.thread for pool size
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-16014
>                 URL: https://issues.apache.org/jira/browse/HIVE-16014
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>
> HiveMetastoreChecker uses hive.mv.files.thread configuration value for determining the pool size as below :
> {noformat}
> private void checkPartitionDirs(Path basePath, Set<Path> allDirs, int maxDepth) throws IOException, HiveException {
>     ConcurrentLinkedQueue<Path> basePaths = new ConcurrentLinkedQueue<>();
>     basePaths.add(basePath);
>     Set<Path> dirSet = Collections.newSetFromMap(new ConcurrentHashMap<Path, Boolean>());
>     // Here we just reuse the THREAD_COUNT configuration for
>     // HIVE_MOVE_FILES_THREAD_COUNT
>     int poolSize = conf.getInt(ConfVars.HIVE_MOVE_FILES_THREAD_COUNT.varname, 15);
>     // Check if too low config is provided for move files. 2x CPU is reasonable max count.
>     poolSize = poolSize == 0 ? poolSize : Math.max(poolSize,
>         Runtime.getRuntime().availableProcessors() * 2);
> {noformat}
> msck is commonly used to add the missing partitions for the table from the Filesystem. In such a case different pool sizes for HMSHandler and HiveMetastoreChecker can affect the performance. Eg. If {{hive.metastore.fshandler.threads}} is set to a lower value like 15 and {{hive.mv.files.thread}} is much higher like 100 or vice versa the smaller pool will become the bottleneck. If would be good to use {{hive.metastore.fshandler.threads}} to size the pool for HiveMetastoreChecker since the number missing partitions and number of partitions to be added will most likely be the same. In such a case the performance of the query will be optimum when both the pool sizes are same.
> Since it is possible to tune both the configs individually it will be very likely that they may be different. But since there is a strong co-relation between amount of work done by HiveMetastoreChecker and HiveMetastore.add_partitions call it might be a good idea to use {{hive.metastore.fshandler.threads}} for pool size instead of {{hive.mv.files.thread}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)