You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2016/01/29 18:58:39 UTC

[jira] [Updated] (YARN-4594) container-executor fails to remove directory tree when chmod required

     [ https://issues.apache.org/jira/browse/YARN-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated YARN-4594:
-----------------------------
    Summary: container-executor fails to remove directory tree when chmod required  (was: Fix test-container-executor.c to pass)

Thanks for the patch, Colin!

I noticed we're using openat, fchmodat, and unlinkat for the first time.  I suspect most other POSIX-like distributions support this, but I think these were only recently added to MacOS X (in 10.9 Yosemite).  I'm not sure if anyone uses container-executor for MacOS X (or if container-executor even compiles/works for MacOS X today). but adding these could break the native build for those using older MacOS X versions.  One alternative would be a double-pass with ftw where we walk just the directories first changing permissions then followed by the walk it does today.  The directory trees involved are going to be very shallow, so it's probably not a problem in practice if we decided to go that route.

If we stick with the custom walker then here are some comments on the patch:
* I think this needs to use strerror(-fd):
{code}
  if (fd < 0) {
    fprintf(LOGFILE, "error opening %s: %s\n", name, strerror(ret));
    goto done;
  }
{code}
* There's no check for an error being encountered by readdir and therefore no logging if it does occur
* Sometimes recursive_unlink_helper is returning errno and sometimes it is returning -errno.  For example, the "failed to stat" path will set ret=errno and return -ret as -errno, but the "failed to unlink" path will set ret=-errno and thus return errno.



> container-executor fails to remove directory tree when chmod required
> ---------------------------------------------------------------------
>
>                 Key: YARN-4594
>                 URL: https://issues.apache.org/jira/browse/YARN-4594
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: YARN-4594.001.patch
>
>
> test-container-executor.c doesn't work:
> * It assumes that realpath(/bin/ls) will be /bin/ls, whereas it is actually /usr/bin/ls on many systems.
> * The recursive delete logic in container-executor.c fails -- nftw does the wrong thing when confronted with directories with the wrong mode (permission bits), leading to an attempt to run rmdir on a non-empty directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)