You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Miklos Szegedi (JIRA)" <ji...@apache.org> on 2016/12/02 03:04:58 UTC
[jira] [Commented] (YARN-5714) ContainerExecutor does not order environment map

    [ https://issues.apache.org/jira/browse/YARN-5714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713858#comment-15713858 ] 

Miklos Szegedi commented on YARN-5714:
--------------------------------------

Thank you, for the patch, [~rcatherinot]!
1. "%%A%%" should not match any variable in Windows but the current patch returns "A". Double %% is replaced with single %
2. This is a potential double increase of i without checking i < l
{code}
994	            if (c == '\\') {
995	              i++;
996	            }
997	            if (c == '\'') {
998	              break;
999	            }
1000	            i++;
{code}
3. There is a typo in this sentence:
{code}
855	      // 2 - we need a map implementation that support entries to be
{code}

4. You have an iteration that has a depth of 3. This will run at every container launch, so it might cause perf issues. In particular {code}copy.containsKey(envDep){code} will be called again and again (6 times?) until all dependencies are added, if a set is ordered like this:
{code}
D=$A $B $C
A=*
B=*
C=*
{code}

Have you considered an algorithm, where you scan the dependencies only once? You may not even need the hash map for it in this case.
{code}
for (env: envsI)
  stack.push(env)
while(!stack.empty)
  env = stack.pop()
  if (!env.marked)
    env.mark
    if (env.dependencies.empty)
      stack.push(env)
      for(dep : end.dependencies())
        stack.push(dep)
  else
    ordered.putifnotthere(env);
{code}
5. It might be slower (needs to be tested) but have you considered regex for parsing? It would make the code shorter and easier to understand.

> ContainerExecutor does not order environment map
> ------------------------------------------------
>
>                 Key: YARN-5714
>                 URL: https://issues.apache.org/jira/browse/YARN-5714
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.4.1, 2.5.2, 2.7.3, 2.6.4, 3.0.0-alpha1
>         Environment: all (linux and windows alike)
>            Reporter: Remi Catherinot
>            Assignee: Remi Catherinot
>            Priority: Trivial
>              Labels: oct16-medium
>         Attachments: YARN-5714.001.patch, YARN-5714.002.patch, YARN-5714.003.patch, YARN-5714.004.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> when dumping the launch container script, environment variables are dumped based on the order internally used by the map implementation (hash based). It does not take into consideration that some env varibales may refer each other, and so that some env variables must be declared before those referencing them.
> In my case, i ended up having LD_LIBRARY_PATH which was depending on HADOOP_COMMON_HOME being dumped before HADOOP_COMMON_HOME. Thus it had a wrong value and so native libraries weren't loaded. jobs were running but not at their best efficiency. This is just a use case falling into that bug, but i'm sure others may happen as well.
> I already have a patch running in my production environment, i just estimate to 5 days for packaging the patch in the right fashion for JIRA + try my best to add tests.
> Note : the patch is not OS aware with a default empty implementation. I will only implement the unix version on a 1st release. I'm not used to windows env variables syntax so it will take me more time/research for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org