You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org> on 2015/01/22 23:45:35 UTC

[jira] [Commented] (HADOOP-11506) Configuration.get() is unnecessarily slow

    [ https://issues.apache.org/jira/browse/HADOOP-11506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288362#comment-14288362 ] 

Dmitriy V. Ryaboy commented on HADOOP-11506:
--------------------------------------------

Most properties are not subject to variable substitution, and exit in the following code block:

{code}
 if (!match.find()) {
return eval;
}
{code}

Getting there requires creating a matcher, allocating a HashSet, and evaluating the regex:
{code}
private static final Pattern VAR_PATTERN =
Pattern.compile("\\$\\{[^\\}\\$\u0020]+\\}");
{code}

'tis far simpler to bail early and not do expensive regex evaluation in the majority of cases, by adding a simple check:

{code}
 if (expr == null) {
return null;
}
if (!expr.contains("$")) {
  return expr;
}
{code}

(The new check is the second if condition above).

Many users assume that Configuration.get() is a Map lookup, and call it inside map / reduce functions, which adds up to non-trivial overhead when the m/r functions are simple.

> Configuration.get() is unnecessarily slow
> -----------------------------------------
>
>                 Key: HADOOP-11506
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11506
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Dmitriy V. Ryaboy
>
> Profiling several large Hadoop jobs, we discovered that a surprising amount of time was spent inside Configuration.get, more specifically, in regex matching caused by the substituteVars call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)