You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org> on 2017/11/14 18:47:45 UTC

[Impala-ASF-CR] IMPALA-5624: ProcessStateInfo::ReadProcFileDescriptorInfo() should not fork a process

Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8546


Change subject: IMPALA-5624: ProcessStateInfo::ReadProcFileDescriptorInfo() should not fork a process
......................................................................

IMPALA-5624: ProcessStateInfo::ReadProcFileDescriptorInfo() should not fork a process

Running shell commands from impalad can be problematic, because using popen leads
to forking which causes a spike in virtual memory. To avoid this, "ls" is replaced
with posix api calls.

FileDescriptorMap fd_desc_ was only used to get the number of file descriptors, so
it was unneccesery work to initialize it. It is removed, and only the number of file
descriptors is computed.

No automatic test is created for this function, because there is no way to know the
"expected value" in advance, and the number of file desciptors can change anytime.

Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
---
M be/src/util/process-state-info.cc
M be/src/util/process-state-info.h
2 files changed, 15 insertions(+), 25 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/8546/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 7:

Flaky test, will restart the Jenkins job.


-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Nov 2017 01:56:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 3:

> > Dan, what's your take?
 > 
 > I think the question is where does the structure that is returned
 > come from? In the non-reentrant C lib functions, the problem is
 > usually that they return a pointer to a static structure, so if you
 > call the function again (or concurrently), the returned structure
 > can be overwritten.

On, just saw that comment about modern implementations being thread safe and readdir_r being deprecated. So, yeah seems like readdir() is the way to go.


-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Nov 2017 17:49:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1503/


-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Mon, 20 Nov 2017 21:57:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5624: ProcessStateInfo::ReadProcFileDescriptorInfo() should not fork a process

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: ProcessStateInfo::ReadProcFileDescriptorInfo() should not fork a process
......................................................................


Patch Set 2:

> I looked around, and it looks like C++17 and boost have filesystem
 > abstractions, but I think readdir() is simple enough.

The boost abstractions have been problematic for us in the past (e.g. bugs that lead to exceptions thrown even when using error propagating interface), so agree using readdir_r is good here.


-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Nov 2017 01:17:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8546/2/be/src/util/process-state-info.h
File be/src/util/process-state-info.h:

http://gerrit.cloudera.org:8080/#/c/8546/2/be/src/util/process-state-info.h@99
PS2, Line 99:   /// Number of file descriptors.
> Please clarify what kinds of file descriptors this counts.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Nov 2017 01:47:15 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 3:

> Patch Set 3:
> 
> > > Patch Set 3:
>  > >
>  > > > Patch Set 2:
>  > > >
>  > > > Based on my reading of "man readdir", it's not threadsafe. I
>  > think the only usage here is if a user hits "http://impalad:.../"
>  > to see the web UI. It's possible that there's a lock somewhere
>  > preventing concurrent use, but given that this is already
>  > reasonably expensive, I'd recommend using the reentrant readdir_r
>  > instead.
>  > > >
>  > > > I looked around, and it looks like C++17 and boost have
>  > filesystem abstractions, but I think readdir() is simple enough.
>  > >
>  > > I looked at the manpage of readdir() here (http://man7.org/linux/man-pages/man3/readdir.3.html)
>  > and it claims that "in modern implementations (including the glibc
>  > implementation), concurrent calls to readdir()
>  > > that specify different directory streams are thread-safe.". I
>  > tried this out at https://gist.github.com/lekv/508f540053340ffcf095d49b27c4317d
>  > and was not able to hit a race (Note that you cannot re-use a dir
>  > stream). My interpretation is that you cannot use two threads to
>  > scan a directory in parallel by using the same dir stream (as in
>  > "the same pointer") because one thread's call will overwrite the
>  > internal buffer of the other. Creating a fresh dir stream seems
>  > fine to me though.
>  > 
>  > Ok, that makes sense.
> 
> I would prefer readdir, if we can assume that it is threadsafe. I grepped around, and squeasel + kudu/util also uses readdir, though there are some comments in Kudu that state that it is not threadsafe.
> 
> scandir can be also a possibility ( man7.org/linux/man-pages/man3/scandir.3.html ).

Looking at the definition of __dirstream in the libc source of my dev machine shows that there are no shared datastructures:

struct __dirstream
  {
    int fd;         /* File descriptor.  */

    __libc_lock_define (, lock) /* Mutex lock for this structure.  */

    size_t allocation;      /* Space allocated for the block.  */
    size_t size;        /* Total valid data in the block.  */
    size_t offset;      /* Current offset into the block.  */

    off_t filepos;      /* Position of next entry to read.  */

    int errcode;        /* Delayed error code.  */

    /* Directory block.  We must make sure that this block starts
       at an address that is aligned adequately enough to store
       dirent entries.  Using the alignment of "void *" is not
       sufficient because dirents on 32-bit platforms can require
       64-bit alignment.  We use "long double" here to be consistent
       with what malloc uses.  */
    char data[0] __attribute__ ((aligned (__alignof__ (long double))));
  };

In my small test program, each thread also kept its own file descriptor, making me feel confident that this usage of readdir is thread safe.

Dan, what's your take?


-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Nov 2017 03:50:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 7: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1503/


-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Nov 2017 01:30:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 3:

> Patch Set 3:
> 
> > Patch Set 2:
> > 
> > Based on my reading of "man readdir", it's not threadsafe. I think the only usage here is if a user hits "http://impalad:.../" to see the web UI. It's possible that there's a lock somewhere preventing concurrent use, but given that this is already reasonably expensive, I'd recommend using the reentrant readdir_r instead.
> > 
> > I looked around, and it looks like C++17 and boost have filesystem abstractions, but I think readdir() is simple enough.
> 
> I looked at the manpage of readdir() here (http://man7.org/linux/man-pages/man3/readdir.3.html) and it claims that "in modern implementations (including the glibc implementation), concurrent calls to readdir()
> that specify different directory streams are thread-safe.". I tried this out at https://gist.github.com/lekv/508f540053340ffcf095d49b27c4317d and was not able to hit a race (Note that you cannot re-use a dir stream). My interpretation is that you cannot use two threads to scan a directory in parallel by using the same dir stream (as in "the same pointer") because one thread's call will overwrite the internal buffer of the other. Creating a fresh dir stream seems fine to me though.

Ok, that makes sense.


-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Nov 2017 04:38:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 3:

(3 comments)

> Based on my reading of "man readdir", it's not threadsafe. I think
 > the only usage here is if a user hits "http://impalad:.../" to see
 > the web UI. It's possible that there's a lock somewhere preventing
 > concurrent use, but given that this is already reasonably
 > expensive, I'd recommend using the reentrant readdir_r instead.
 > 
 > I looked around, and it looks like C++17 and boost have filesystem
 > abstractions, but I think readdir() is simple enough.

I have replaced readdir() with readdir_r(), but the solution is only ok for Linux, and may cause problems on other Unix systems, because the size of dirent.d_name is not always fix.

The general solution in linux.die.net/man/3/readdir_r can be actually problematic, see womble.decadent.org.uk/readdir_r-advisory.html The scenario is not.

It is easy to allocate a buffer whose size will be surely enough in /proc/self/fd, but I am unsure about the ideal solution.
- Should I care about portability to other Posix variants?
- Is there a way to express "compilation error if other Posix variant than Linux"?
- Is there a maximum size for filenames? Most examples use 255 (+1 for \0), but I do not know if it is an official value or not.
- Is there a guideline for the maximum size to allocate on stack? Is ~255 ok?

http://gerrit.cloudera.org:8080/#/c/8546/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8546/2//COMMIT_MSG@7
PS2, Line 7: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
> Please describe in this line what the change does, not what it should do, i
Done


http://gerrit.cloudera.org:8080/#/c/8546/2//COMMIT_MSG@11
PS2, Line 11: Posix
> nit: Posix is a name and should be capitalized.
Done


http://gerrit.cloudera.org:8080/#/c/8546/2//COMMIT_MSG@18
PS2, Line 18: way to know the "expected value" in advance, and the number of file desciptors can
> Couldn't we test that a reasonable minimum number of file descriptors is re
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Nov 2017 01:46:42 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1505/


-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Nov 2017 01:56:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker, Philip Zeyliger, Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8546

to look at the new patch set (#5).

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................

IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Running shell commands from impalad can be problematic, because using popen() leads
to forking which causes a spike in virtual memory. To avoid this, "ls" is replaced
with POSIX API calls.

FileDescriptorMap fd_desc_ was only used to get the number of file descriptors, so
it was unneccesery work to initialize it. It is removed, and only the number of file
descriptors is computed.

The automatic test for this function is only a sanity check,  because there is no
way to know the "expected value" in advance, and the number of file desciptors can
change anytime.

Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
---
M be/src/util/proc-info-test.cc
M be/src/util/process-state-info.cc
M be/src/util/process-state-info.h
3 files changed, 32 insertions(+), 45 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/8546/5
-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 4:

Lars, can you take care of doing the +2 review for this one?


-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Nov 2017 23:34:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5624: ProcessStateInfo::ReadProcFileDescriptorInfo() should not fork a process

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: ProcessStateInfo::ReadProcFileDescriptorInfo() should not fork a process
......................................................................


Patch Set 2:

Based on my reading of "man readdir", it's not threadsafe. I think the only usage here is if a user hits "http://impalad:.../" to see the web UI. It's possible that there's a lock somewhere preventing concurrent use, but given that this is already reasonably expensive, I'd recommend using the reentrant readdir_r instead.

I looked around, and it looks like C++17 and boost have filesystem abstractions, but I think readdir() is simple enough.


-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Tue, 14 Nov 2017 19:09:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 3:

> > Patch Set 3:
 > >
 > > > Dan, what's your take?
 > >
 > > I think the question is where does the structure that is returned
 > come from? In the non-reentrant C lib functions, the problem is
 > usually that they return a pointer to a static structure, so if you
 > call the function again (or concurrently), the returned structure
 > can be overwritten.
 > 
 > I double checked the opendir() implementation in sysdeps/posix/opendir.c
 > and it does not return any static structures but calls malloc() to
 > get a new DIR*.

I was talking about the return value of readdir() (dirent*).


-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Nov 2017 18:08:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 4:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/8546/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8546/4//COMMIT_MSG@11
PS4, Line 11: api
nit: acronyms are usually all caps


http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.h
File be/src/util/process-state-info.h:

http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.h@77
PS4, Line 77: /// File Descriptors information will be read from /proc/pid/fd.
nit: /proc/self/fd


http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc
File be/src/util/process-state-info.cc:

http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@155
PS4, Line 155: does
nit: do not count.


http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@156
PS4, Line 156: self
nit: either use getpid() here to be consistent with the rest of the file, or change the other occurrences to /proc/self/*


http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@159
PS4, Line 159: dir
Can you try to think of better variable names? You have a DIR called d, and something else called dir. DIR is really a directory stream, and readdir returns directory entries.


http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@160
PS4, Line 160:     while ((dir = readdir(d)) != nullptr) ++fd_count;
Please add a brief comment why your usage of readdir is thread-safe here.


http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@163
PS4, Line 163: std::
You shouldn't need std:: here



-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Nov 2017 23:55:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 2:

Sorry, the end of my sentence has disappeared:
"The scenario is not." ->
The scenario is not relevant in this case, but it is an interesting read anyway.


-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Nov 2017 01:51:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................

IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Running shell commands from impalad can be problematic, because using popen() leads
to forking which causes a spike in virtual memory. To avoid this, "ls" is replaced
with POSIX API calls.

FileDescriptorMap fd_desc_ was only used to get the number of file descriptors, so
it was unneccesery work to initialize it. It is removed, and only the number of file
descriptors is computed.

The automatic test for this function is only a sanity check,  because there is no
way to know the "expected value" in advance, and the number of file desciptors can
change anytime.

Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Reviewed-on: http://gerrit.cloudera.org:8080/8546
Reviewed-by: Lars Volker <lv...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/util/proc-info-test.cc
M be/src/util/process-state-info.cc
M be/src/util/process-state-info.h
3 files changed, 35 insertions(+), 44 deletions(-)

Approvals:
  Lars Volker: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 4: Code-Review+1

Looks fine as far as I'm concerned. I learned something about readdir()!


-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Nov 2017 23:29:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 7: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Mon, 20 Nov 2017 21:56:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker, Philip Zeyliger, Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8546

to look at the new patch set (#4).

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................

IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Running shell commands from impalad can be problematic, because using popen() leads
to forking which causes a spike in virtual memory. To avoid this, "ls" is replaced
with Posix api calls.

FileDescriptorMap fd_desc_ was only used to get the number of file descriptors, so
it was unneccesery work to initialize it. It is removed, and only the number of file
descriptors is computed.

The automatic test for this function is only a sanity check,  because there is no
way to know the "expected value" in advance, and the number of file desciptors can
change anytime.

Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
---
M be/src/util/proc-info-test.cc
M be/src/util/process-state-info.cc
M be/src/util/process-state-info.h
3 files changed, 16 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/8546/4
-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8546/5/be/src/util/process-state-info.h
File be/src/util/process-state-info.h:

http://gerrit.cloudera.org:8080/#/c/8546/5/be/src/util/process-state-info.h@61
PS5, Line 61: /// Below are some of the Process status information that will be read
> nit: long line. Please have a look at how to configure your text editor to 
Done


http://gerrit.cloudera.org:8080/#/c/8546/5/be/src/util/process-state-info.cc
File be/src/util/process-state-info.cc:

http://gerrit.cloudera.org:8080/#/c/8546/5/be/src/util/process-state-info.cc@36
PS5, Line 36: using boost::algorithm::is_any_of;
> nit: sort alphabetically?
Done


http://gerrit.cloudera.org:8080/#/c/8546/5/be/src/util/process-state-info.cc@157
PS5, Line 157:     // readdir() is not thread-safe according to its man page, but in glibc
> Can you include a reference to the source (e.g. https://www.gnu.org/softwar
Done


http://gerrit.cloudera.org:8080/#/c/8546/6/be/src/util/process-state-info.cc
File be/src/util/process-state-info.cc:

http://gerrit.cloudera.org:8080/#/c/8546/6/be/src/util/process-state-info.cc@161
PS6, Line 161:       if(dir_entry->d_name[0] != '.') ++fd_count; // . and .. do not count
www.gnu.org/software/libc/manual/html_node/Reading_002fClosing-Directory.html states:
"Portability Note: On some systems readdir may not return entries for . and ..,"
Because of this, the filename must be checked to return the exact number of descriptors. As every filename in fd directory is a number, it is enough to check the first character.



-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Mon, 20 Nov 2017 18:51:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8546/5/be/src/util/process-state-info.h
File be/src/util/process-state-info.h:

http://gerrit.cloudera.org:8080/#/c/8546/5/be/src/util/process-state-info.h@61
PS5, Line 61: /// Below are some of the Process status information that will be read from /proc/self/status.
nit: long line. Please have a look at how to configure your text editor to highlight these for you. Alternatively check out git hooks to check for changed lines that are too long.


http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc
File be/src/util/process-state-info.cc:

http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@159
PS4, Line 159: ry 
> Done
Much better!


http://gerrit.cloudera.org:8080/#/c/8546/5/be/src/util/process-state-info.cc
File be/src/util/process-state-info.cc:

http://gerrit.cloudera.org:8080/#/c/8546/5/be/src/util/process-state-info.cc@36
PS5, Line 36: using std::to_string;
nit: sort alphabetically?


http://gerrit.cloudera.org:8080/#/c/8546/5/be/src/util/process-state-info.cc@157
PS5, Line 157:     // readdir() is not thread-safe according to its man page, but in glibc
Can you include a reference to the source (e.g. https://www.gnu.org/software/libc/manual/html_node/Reading_002fClosing-Directory.html) ?



-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Sat, 18 Nov 2017 00:54:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 6: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Mon, 20 Nov 2017 21:56:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 7: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Nov 2017 05:28:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 3:

> Patch Set 3:
> 
> > Dan, what's your take?
> 
> I think the question is where does the structure that is returned come from? In the non-reentrant C lib functions, the problem is usually that they return a pointer to a static structure, so if you call the function again (or concurrently), the returned structure can be overwritten.

I double checked the opendir() implementation in sysdeps/posix/opendir.c and it does not return any static structures but calls malloc() to get a new DIR*.


-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Nov 2017 17:58:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 3:

> Dan, what's your take?

I think the question is where does the structure that is returned come from? In the non-reentrant C lib functions, the problem is usually that they return a pointer to a static structure, so if you call the function again (or concurrently), the returned structure can be overwritten.


-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Nov 2017 17:47:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker, Philip Zeyliger, Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8546

to look at the new patch set (#3).

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................

IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Running shell commands from impalad can be problematic, because using popen leads
to forking which causes a spike in virtual memory. To avoid this, "ls" is replaced
with Posix api calls.

FileDescriptorMap fd_desc_ was only used to get the number of file descriptors, so
it was unneccesery work to initialize it. It is removed, and only the number of file
descriptors is computed.

The automatic test for this function is only a sanity check,  because there is no
way to know the "expected value" in advance, and the number of file desciptors can
change anytime.

Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
---
M be/src/util/proc-info-test.cc
M be/src/util/process-state-info.cc
M be/src/util/process-state-info.h
3 files changed, 16 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/8546/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 4:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/8546/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8546/4//COMMIT_MSG@11
PS4, Line 11: api
> nit: acronyms are usually all caps
Done


http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.h
File be/src/util/process-state-info.h:

http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.h@77
PS4, Line 77: /// File Descriptors information will be read from /proc/pid/fd.
> nit: /proc/self/fd
Done


http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc
File be/src/util/process-state-info.cc:

http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@155
PS4, Line 155: does
> nit: do not count.
Done


http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@156
PS4, Line 156: self
> nit: either use getpid() here to be consistent with the rest of the file, o
Done


http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@159
PS4, Line 159: dir
> Can you try to think of better variable names? You have a DIR called d, and
Done


http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@160
PS4, Line 160:     while ((dir = readdir(d)) != nullptr) ++fd_count;
> Please add a brief comment why your usage of readdir is thread-safe here.
Done


http://gerrit.cloudera.org:8080/#/c/8546/4/be/src/util/process-state-info.cc@163
PS4, Line 163: std::
> You shouldn't need std:: here
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Sat, 18 Nov 2017 00:44:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 3:

> Patch Set 3:
> 
> > > Patch Set 3:
>  > >
>  > > > Dan, what's your take?
>  > >
>  > > I think the question is where does the structure that is returned
>  > come from? In the non-reentrant C lib functions, the problem is
>  > usually that they return a pointer to a static structure, so if you
>  > call the function again (or concurrently), the returned structure
>  > can be overwritten.
>  > 
>  > I double checked the opendir() implementation in sysdeps/posix/opendir.c
>  > and it does not return any static structures but calls malloc() to
>  > get a new DIR*.
> 
> I was talking about the return value of readdir() (dirent*).

Ah, I see, sorry for the confusion.

readdir() only accesses the DIR* struct that is passed as a parameter (I checked that). The DIR* struct gets created by opendir() and does not share data structures with other DIR*s. It contains a fd, which is different for every DIR*, too. readdir() eventually calls getdents(2) and copies the result into the DIR* struct.


-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Nov 2017 18:45:57 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5624: ProcessStateInfo::ReadProcFileDescriptorInfo() should not fork a process

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: ProcessStateInfo::ReadProcFileDescriptorInfo() should not fork a process
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8546/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8546/2//COMMIT_MSG@7
PS2, Line 7: IMPALA-5624: ProcessStateInfo::ReadProcFileDescriptorInfo() should not fork a process
Please describe in this line what the change does, not what it should do, i.e. "prevent fork".


http://gerrit.cloudera.org:8080/#/c/8546/2//COMMIT_MSG@11
PS2, Line 11: posix
nit: Posix is a name and should be capitalized.


http://gerrit.cloudera.org:8080/#/c/8546/2//COMMIT_MSG@18
PS2, Line 18: "expected value" in advance, and the number of file desciptors can change anytime.
Couldn't we test that a reasonable minimum number of file descriptors is returned, e.g. >0?


http://gerrit.cloudera.org:8080/#/c/8546/2/be/src/util/process-state-info.h
File be/src/util/process-state-info.h:

http://gerrit.cloudera.org:8080/#/c/8546/2/be/src/util/process-state-info.h@99
PS2, Line 99:   /// Number of file descriptors.
Please clarify what kinds of file descriptors this counts.



-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Tue, 14 Nov 2017 19:09:20 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker, Philip Zeyliger, Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8546

to look at the new patch set (#6).

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................

IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Running shell commands from impalad can be problematic, because using popen() leads
to forking which causes a spike in virtual memory. To avoid this, "ls" is replaced
with POSIX API calls.

FileDescriptorMap fd_desc_ was only used to get the number of file descriptors, so
it was unneccesery work to initialize it. It is removed, and only the number of file
descriptors is computed.

The automatic test for this function is only a sanity check,  because there is no
way to know the "expected value" in advance, and the number of file desciptors can
change anytime.

Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
---
M be/src/util/proc-info-test.cc
M be/src/util/process-state-info.cc
M be/src/util/process-state-info.h
3 files changed, 35 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/8546/6
-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 3:

> Patch Set 2:
> 
> Based on my reading of "man readdir", it's not threadsafe. I think the only usage here is if a user hits "http://impalad:.../" to see the web UI. It's possible that there's a lock somewhere preventing concurrent use, but given that this is already reasonably expensive, I'd recommend using the reentrant readdir_r instead.
> 
> I looked around, and it looks like C++17 and boost have filesystem abstractions, but I think readdir() is simple enough.

I looked at the manpage of readdir() here (http://man7.org/linux/man-pages/man3/readdir.3.html) and it claims that "in modern implementations (including the glibc implementation), concurrent calls to readdir()
that specify different directory streams are thread-safe.". I tried this out at https://gist.github.com/lekv/508f540053340ffcf095d49b27c4317d and was not able to hit a race (Note that you cannot re-use a dir stream). My interpretation is that you cannot use two threads to scan a directory in parallel by using the same dir stream (as in "the same pointer") because one thread's call will overwrite the internal buffer of the other. Creating a fresh dir stream seems fine to me though.


-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Nov 2017 03:52:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/8546 )

Change subject: IMPALA-5624: Replace "ls -l" with opendir() in ProcessStateInfo
......................................................................


Patch Set 3:

> > Patch Set 3:
 > >
 > > > Patch Set 2:
 > > >
 > > > Based on my reading of "man readdir", it's not threadsafe. I
 > think the only usage here is if a user hits "http://impalad:.../"
 > to see the web UI. It's possible that there's a lock somewhere
 > preventing concurrent use, but given that this is already
 > reasonably expensive, I'd recommend using the reentrant readdir_r
 > instead.
 > > >
 > > > I looked around, and it looks like C++17 and boost have
 > filesystem abstractions, but I think readdir() is simple enough.
 > >
 > > I looked at the manpage of readdir() here (http://man7.org/linux/man-pages/man3/readdir.3.html)
 > and it claims that "in modern implementations (including the glibc
 > implementation), concurrent calls to readdir()
 > > that specify different directory streams are thread-safe.". I
 > tried this out at https://gist.github.com/lekv/508f540053340ffcf095d49b27c4317d
 > and was not able to hit a race (Note that you cannot re-use a dir
 > stream). My interpretation is that you cannot use two threads to
 > scan a directory in parallel by using the same dir stream (as in
 > "the same pointer") because one thread's call will overwrite the
 > internal buffer of the other. Creating a fresh dir stream seems
 > fine to me though.
 > 
 > Ok, that makes sense.

I would prefer readdir, if we can assume that it is threadsafe. I grepped around, and squeasel + kudu/util also uses readdir, though there are some comments in Kudu that state that it is not threadsafe.

scandir can be also a possibility ( man7.org/linux/man-pages/man3/scandir.3.html ).


-- 
To view, visit http://gerrit.cloudera.org:8080/8546
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibffae8069a62e100abbfa7d558b49040b095ddc0
Gerrit-Change-Number: 8546
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Nov 2017 01:03:41 +0000
Gerrit-HasComments: No