You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "Sergio Peña (JIRA)" <ji...@apache.org> on 2017/02/10 22:39:41 UTC

[jira] [Commented] (HIVE-15881) Use new thread count variable name instead of mapred.dfsclient.parallelism.max

    [ https://issues.apache.org/jira/browse/HIVE-15881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861943#comment-15861943 ] 

Sergio Peña commented on HIVE-15881:
------------------------------------

[~ashutoshc] [~poeppt] [~stakiar] What do you think about this change? We have found this variable name a little confusing because it is a Hadoop-specific variable, and the Utilities is used just for Hive. The new and old variable will do the same thing during Hive 2.x.

> Use new thread count variable name instead of mapred.dfsclient.parallelism.max
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-15881
>                 URL: https://issues.apache.org/jira/browse/HIVE-15881
>             Project: Hive
>          Issue Type: Task
>          Components: Query Planning
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>            Priority: Minor
>
> The Utilities class has two methods, {{getInputSummary}} and {{getInputPaths}}, that use the variable {{mapred.dfsclient.parallelism.max}} to get the summary of a list of input locations in parallel. These methods are Hive related, but the variable name does not look it is specific for Hive.
> Also, the above variable is not on HiveConf nor used anywhere else. I just found a reference on the Hadoop MR1 code.
> I'd like to propose the deprecation of {{mapred.dfsclient.parallelism.max}}, and use a different variable name, such as {{hive.get.input.listing.num.threads}}, that reflects the intention of the variable. The removal of the old variable might happen on Hive 3.x



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)