You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org> on 2015/01/22 23:45:35 UTC
[jira] [Commented] (HADOOP-11506) Configuration.get() is
unnecessarily slow
[ https://issues.apache.org/jira/browse/HADOOP-11506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288362#comment-14288362 ]
Dmitriy V. Ryaboy commented on HADOOP-11506:
--------------------------------------------
Most properties are not subject to variable substitution, and exit in the following code block:
{code}
if (!match.find()) {
return eval;
}
{code}
Getting there requires creating a matcher, allocating a HashSet, and evaluating the regex:
{code}
private static final Pattern VAR_PATTERN =
Pattern.compile("\\$\\{[^\\}\\$\u0020]+\\}");
{code}
'tis far simpler to bail early and not do expensive regex evaluation in the majority of cases, by adding a simple check:
{code}
if (expr == null) {
return null;
}
if (!expr.contains("$")) {
return expr;
}
{code}
(The new check is the second if condition above).
Many users assume that Configuration.get() is a Map lookup, and call it inside map / reduce functions, which adds up to non-trivial overhead when the m/r functions are simple.
> Configuration.get() is unnecessarily slow
> -----------------------------------------
>
> Key: HADOOP-11506
> URL: https://issues.apache.org/jira/browse/HADOOP-11506
> Project: Hadoop Common
> Issue Type: Bug
> Reporter: Dmitriy V. Ryaboy
>
> Profiling several large Hadoop jobs, we discovered that a surprising amount of time was spent inside Configuration.get, more specifically, in regex matching caused by the substituteVars call.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)