You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Adar Dembo (Code Review)" <ge...@cloudera.org> on 2018/05/16 21:21:32 UTC

[kudu-CR] KUDU-2427: retry more system calls on EINTR

Hello Todd Lipcon,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/10435

to review the following change.


Change subject: KUDU-2427: retry more system calls on EINTR
......................................................................

KUDU-2427: retry more system calls on EINTR

In order to collect our own stack traces, we send ourselves a SIGUSR2. The
diagnostics log initiates collection every 60s, as do service queue
overflows. As such, it's really important that we retry every interruptible
system call rather than passing the EINTR up the call stack as a failure.

For whatever reason this happens more frequently on Ubuntu 18.04, though
maybe it's because I've placed my test directory on tmpfs. For example, I
can easily repro a crash due to non-existent retry with the following
command line:

  bin/tablet_server-test --gtest_repeat=1000 --gtest_throw_on_failure \
    --diagnostics_log_stack_traces_interval_ms=100 \
    --unlock_experimental_flags --gtest_filter=*KUDU_177

I went through env_posix.cc and looked for EINTR in every syscall's manpage.
If I found it, I made sure the call was surrounded by RETRY_ON_EINTR.

Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
---
M src/kudu/util/env_posix.cc
1 file changed, 33 insertions(+), 15 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/35/10435/1
-- 
To view, visit http://gerrit.cloudera.org:8080/10435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
Gerrit-Change-Number: 10435
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2427: retry more system calls on EINTR

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/10435 )

Change subject: KUDU-2427: retry more system calls on EINTR
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10435/4/src/kudu/security/tls_socket.cc
File src/kudu/security/tls_socket.cc:

http://gerrit.cloudera.org:8080/#/c/10435/4/src/kudu/security/tls_socket.cc@63
PS4, Line 63:   RETRY_ON_EINTR(bytes_written, SSL_write(ssl_.get(), buf, amt));
This (and the equivalent change to SSL_read) are a little funky; is it safe to expect errno to be the primary vehicle for delivering error information from OpenSSL? Normally errno is only used after libc calls.


http://gerrit.cloudera.org:8080/#/c/10435/4/src/kudu/util/net/socket.cc
File src/kudu/util/net/socket.cc:

http://gerrit.cloudera.org:8080/#/c/10435/4/src/kudu/util/net/socket.cc@a465
PS4, Line 465: 
             : 
             : 
             : 
The changes in tls_socket.cc allow this (and the equivalent in BlockingRecv) to be removed.


http://gerrit.cloudera.org:8080/#/c/10435/4/src/kudu/util/net/socket.cc@128
PS4, Line 128:   return (err == EAGAIN) || (err == EWOULDBLOCK);
I'm not as confident about this change. I made it because I think EINTR should be handled at the "ends" and thus it shouldn't show up at higher levels of the stack.



-- 
To view, visit http://gerrit.cloudera.org:8080/10435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
Gerrit-Change-Number: 10435
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 17 May 2018 22:07:09 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2427: retry more system calls on EINTR

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10435 )

Change subject: KUDU-2427: retry more system calls on EINTR
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10435/3/src/kudu/util/net/socket.cc
File src/kudu/util/net/socket.cc:

http://gerrit.cloudera.org:8080/#/c/10435/3/src/kudu/util/net/socket.cc@465
PS3, Line 465:     Status s = Write(buf, num_to_write, &inc_num_written);
             :     tot_written += inc_num_written;
             :     buf += inc_nu
> ah... the RETRY_ON_EINTR around SSL_read is actually a little concerning. I
Yep, SSL_read() and SSL_write() seemed messy when I looked at them last time.  As they say in the doc: '... The behaviour of SSL_read() depends on the underlying BIO. ...'

But first of all, the corresponding change with RETRY_ON_EINTR() in tls_socket.cc looks suspicious to me: while working with OpenSSL functions, it's necessary to clean up the errors from the stack using SSL_get_error().

My another concern is that SSL_read()/SSL_write() might handle EINTR themselves.



-- 
To view, visit http://gerrit.cloudera.org:8080/10435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
Gerrit-Change-Number: 10435
Gerrit-PatchSet: 5
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 21 May 2018 18:59:10 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2427: retry more system calls on EINTR

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10435 )

Change subject: KUDU-2427: retry more system calls on EINTR
......................................................................


Patch Set 7: Code-Review+2

LGTM.

Thank you for searching for those additional calls of EINTR-specific calls.  Maybe, it's worth checking if Todd has more input on this.


-- 
To view, visit http://gerrit.cloudera.org:8080/10435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
Gerrit-Change-Number: 10435
Gerrit-PatchSet: 7
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 23 May 2018 22:55:13 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2427: retry more system calls on EINTR

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10435

to look at the new patch set (#7).

Change subject: KUDU-2427: retry more system calls on EINTR
......................................................................

KUDU-2427: retry more system calls on EINTR

In order to collect its own stack traces, Kudu periodically sends itself a
SIGUSR2. The diagnostics log initiates stack collection every 60s, as do
some service queue overflow events. In theory, the collection shouldn't
affect any ongoing syscalls because the SIGUSR2 signal handler is installed
with SA_RESTART; in practice, not all syscalls are restartable, and
precisely categorizing those that are and those that aren't is difficult. As
such, it's really important that we retry every interruptible syscall rather
than surfacing the EINTR up the call stack as a failure.

For whatever reason this happens more frequently on Ubuntu 18.04, though
maybe it's because I've placed my test directory on tmpfs. For example, I
can easily repro a crash due to non-existent retry with the following
command line:

  bin/tablet_server-test --gtest_repeat=1000 --gtest_throw_on_failure \
    --diagnostics_log_stack_traces_interval_ms=100 \
    --unlock_experimental_flags --gtest_filter=*KUDU_177

This patch also fixes KUDU-2151.

Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
---
M src/kudu/consensus/log_index.cc
M src/kudu/gutil/macros.h
M src/kudu/gutil/sysinfo.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_test.cc
M src/kudu/util/env-test.cc
M src/kudu/util/env_posix.cc
M src/kudu/util/net/socket.cc
M src/kudu/util/os-util.cc
M src/kudu/util/os-util.h
M src/kudu/util/pstack_watcher-test.cc
M src/kudu/util/pstack_watcher.cc
M src/kudu/util/semaphore.cc
M src/kudu/util/subprocess-test.cc
M src/kudu/util/subprocess.cc
15 files changed, 253 insertions(+), 138 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/35/10435/7
-- 
To view, visit http://gerrit.cloudera.org:8080/10435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
Gerrit-Change-Number: 10435
Gerrit-PatchSet: 7
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2427: retry more system calls on EINTR

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/10435 )

Change subject: KUDU-2427: retry more system calls on EINTR
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10435/3/src/kudu/util/net/socket.cc
File src/kudu/util/net/socket.cc:

http://gerrit.cloudera.org:8080/#/c/10435/3/src/kudu/util/net/socket.cc@465
PS3, Line 465:     Status s = Write(buf, num_to_write, &inc_num_written);
             :     tot_written += inc_num_written;
             :     buf += inc_nu
> given that Write and Recv might be overridden by TlsSocket, is this safe? I
Right, but take a look at tls_socket.cc: I changed TlsSocket's Write and Recv to also RETRY_ON_EINTR. The idea is that Socket's BlockingWrite and BlockingRecv should never see an EINTR out of Write/Recv, because either Socket's Write/Recv have retried it, or TlsSocket's overrides did.

If you think that's a reasonable design, I can add a DCHECK and/or documentation if you think it'd help.



-- 
To view, visit http://gerrit.cloudera.org:8080/10435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
Gerrit-Change-Number: 10435
Gerrit-PatchSet: 5
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 21 May 2018 17:31:17 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2427: retry more system calls on EINTR

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/10435 )

Change subject: KUDU-2427: retry more system calls on EINTR
......................................................................


Patch Set 3: Verified+1

Unrelated failure; filed KUDU-2444 for it.


-- 
To view, visit http://gerrit.cloudera.org:8080/10435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
Gerrit-Change-Number: 10435
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 17 May 2018 17:50:45 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2427: retry more system calls on EINTR

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10435

to look at the new patch set (#4).

Change subject: KUDU-2427: retry more system calls on EINTR
......................................................................

KUDU-2427: retry more system calls on EINTR

In order to collect our own stack traces, we send ourselves a SIGUSR2. The
diagnostics log initiates collection every 60s, as do service queue
overflows. As such, it's really important that we retry every interruptible
system call rather than passing the EINTR up the call stack as a failure.

For whatever reason this happens more frequently on Ubuntu 18.04, though
maybe it's because I've placed my test directory on tmpfs. For example, I
can easily repro a crash due to non-existent retry with the following
command line:

  bin/tablet_server-test --gtest_repeat=1000 --gtest_throw_on_failure \
    --diagnostics_log_stack_traces_interval_ms=100 \
    --unlock_experimental_flags --gtest_filter=*KUDU_177

This patch also fixes KUDU-2151.

Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
---
M src/kudu/consensus/log_index.cc
M src/kudu/gutil/macros.h
M src/kudu/gutil/sysinfo.cc
M src/kudu/rpc/negotiation.cc
M src/kudu/security/tls_socket.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/util/env_posix.cc
M src/kudu/util/net/socket.cc
M src/kudu/util/os-util.cc
M src/kudu/util/os-util.h
M src/kudu/util/semaphore.cc
M src/kudu/util/subprocess.cc
12 files changed, 146 insertions(+), 113 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/35/10435/4
-- 
To view, visit http://gerrit.cloudera.org:8080/10435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
Gerrit-Change-Number: 10435
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2427: retry more system calls on EINTR

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has removed Kudu Jenkins from this change.  ( http://gerrit.cloudera.org:8080/10435 )

Change subject: KUDU-2427: retry more system calls on EINTR
......................................................................


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/10435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
Gerrit-Change-Number: 10435
Gerrit-PatchSet: 5
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2427: retry more system calls on EINTR

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/10435 )

Change subject: KUDU-2427: retry more system calls on EINTR
......................................................................


Patch Set 3:

(1 comment)

> Seem like we might be missing a few more:
> - Socket::Close
> - Log::IndexChunk::~IndexChunk
> - a few things in sysinfo.cc (open and close calls)
> - os-util.cc DisableCoreDumps
> - various close() and open() calls in subprocess.cc

I'll push a new rev with these addressed.

I've also changed a couple places with manual EINTR handling to use RETRY_ON_EINTR instead.

http://gerrit.cloudera.org:8080/#/c/10435/3/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

http://gerrit.cloudera.org:8080/#/c/10435/3/src/kudu/util/env_posix.cc@300
PS3, Line 300:     RETRY_ON_EINTR(err, ::close(fd_));
> I think this might be tracked by KUDU-2151? worth adding it to the commit m
Yeah, I'll mention it in the commit message.



-- 
To view, visit http://gerrit.cloudera.org:8080/10435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
Gerrit-Change-Number: 10435
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 17 May 2018 20:27:06 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2427: retry more system calls on EINTR

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10435

to look at the new patch set (#6).

Change subject: KUDU-2427: retry more system calls on EINTR
......................................................................

KUDU-2427: retry more system calls on EINTR

In order to collect our own stack traces, we send ourselves a SIGUSR2. The
diagnostics log initiates collection every 60s, as do service queue
overflows. As such, it's really important that we retry every interruptible
system call rather than passing the EINTR up the call stack as a failure.

For whatever reason this happens more frequently on Ubuntu 18.04, though
maybe it's because I've placed my test directory on tmpfs. For example, I
can easily repro a crash due to non-existent retry with the following
command line:

  bin/tablet_server-test --gtest_repeat=1000 --gtest_throw_on_failure \
    --diagnostics_log_stack_traces_interval_ms=100 \
    --unlock_experimental_flags --gtest_filter=*KUDU_177

This patch also fixes KUDU-2151.

Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
---
M src/kudu/consensus/log_index.cc
M src/kudu/gutil/macros.h
M src/kudu/gutil/sysinfo.cc
M src/kudu/rpc/negotiation.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/util/env_posix.cc
M src/kudu/util/net/socket.cc
M src/kudu/util/os-util.cc
M src/kudu/util/os-util.h
M src/kudu/util/semaphore.cc
M src/kudu/util/subprocess.cc
11 files changed, 142 insertions(+), 105 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/35/10435/6
-- 
To view, visit http://gerrit.cloudera.org:8080/10435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
Gerrit-Change-Number: 10435
Gerrit-PatchSet: 6
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2427: retry more system calls on EINTR

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has removed a vote on this change.

Change subject: KUDU-2427: retry more system calls on EINTR
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/10435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
Gerrit-Change-Number: 10435
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2427: retry more system calls on EINTR

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/10435 )

Change subject: KUDU-2427: retry more system calls on EINTR
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10435/3/src/kudu/util/net/socket.cc
File src/kudu/util/net/socket.cc:

http://gerrit.cloudera.org:8080/#/c/10435/3/src/kudu/util/net/socket.cc@465
PS3, Line 465:     Status s = Write(buf, num_to_write, &inc_num_written);
             :     tot_written += inc_num_written;
             :     buf += inc_nu
> Yep, SSL_read() and SSL_write() seemed messy when I looked at them last tim
OK, I'll revert these changes, and the change to tls_socket.cc.



-- 
To view, visit http://gerrit.cloudera.org:8080/10435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
Gerrit-Change-Number: 10435
Gerrit-PatchSet: 5
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 21 May 2018 20:52:51 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2427: retry more system calls on EINTR

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10435 )

Change subject: KUDU-2427: retry more system calls on EINTR
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10435/3/src/kudu/util/net/socket.cc
File src/kudu/util/net/socket.cc:

http://gerrit.cloudera.org:8080/#/c/10435/3/src/kudu/util/net/socket.cc@465
PS3, Line 465:     Status s = Write(buf, num_to_write, &inc_num_written);
             :     tot_written += inc_num_written;
             :     buf += inc_nu
given that Write and Recv might be overridden by TlsSocket, is this safe? I see for example in TlsSocket::Recv the following code:

    auto error_code = SSL_get_error(ssl_.get(), bytes_read);
    if (error_code == SSL_ERROR_WANT_READ) {
      if (save_errno != 0) {
        return Status::NetworkError("SSL_read error from " + remote.ToString(),
                                    ErrnoToString(save_errno), save_errno);
      }


-- maybe 'save_errno' here could be EINTR, so we end up returning an EINTR posix code to BlockingRecv or BlockingSend? I think we set some flag on OpenSSL to theoretically do auto-retry on EINTR but there was a bunch of messiness around this in the past. See 18e024cf8bcaea192efb63780802cc4c799bbb9c



-- 
To view, visit http://gerrit.cloudera.org:8080/10435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
Gerrit-Change-Number: 10435
Gerrit-PatchSet: 5
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 21 May 2018 17:21:27 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2427: retry more system calls on EINTR

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/10435 )

Change subject: KUDU-2427: retry more system calls on EINTR
......................................................................


Patch Set 5: Verified+1

(1 comment)

Overriding Jenkins, known flaky test.

http://gerrit.cloudera.org:8080/#/c/10435/5/src/kudu/util/semaphore.cc
File src/kudu/util/semaphore.cc:

http://gerrit.cloudera.org:8080/#/c/10435/5/src/kudu/util/semaphore.cc@66
PS5, Line 66:   if (errno == EAGAIN) {
This is also a semantic change, but a correct one, I think.



-- 
To view, visit http://gerrit.cloudera.org:8080/10435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
Gerrit-Change-Number: 10435
Gerrit-PatchSet: 5
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 17 May 2018 23:26:16 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2427: retry more system calls on EINTR

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10435 )

Change subject: KUDU-2427: retry more system calls on EINTR
......................................................................


Patch Set 6:

(17 comments)

It seems we also have some calls with the EINTR errno semantic in src/kudu/tools.  Does it make sense to wrap those with RETRY_ON_EINTR() as well?

E.g.,

tools/tool_action_test.cc:  PCHECK(dup2(STDERR_FILENO, STDOUT_FILENO) == STDOUT_FILENO);
tools/tool_action_common.cc:    close(read_fd_);
tools/tool_action_common.cc:    close(write_fd_);

http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/consensus/log_index.cc
File src/kudu/consensus/log_index.cc:

http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/consensus/log_index.cc@116
PS6, Line 116: RETRY_ON_EINTR(ret, close(fd_));
It's highly unlikely, but is it worth logging about an error?


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/gutil/sysinfo.cc
File src/kudu/gutil/sysinfo.cc:

http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/gutil/sysinfo.cc@301
PS6, Line 301: RETRY_ON_EINTR(ret, close(fd));
Is it worth logging about a failure (if it not EINTR, of course)?


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/rpc/negotiation.cc
File src/kudu/rpc/negotiation.cc:

http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/rpc/negotiation.cc@119
PS6, Line 119:     RETRY_ON_EINTR(ready, ppoll(&poll_fd, 1, &ts, nullptr));
             : #else
             :     RETRY_ON_EINTR(ready, poll(&poll_fd, 1, remaining.ToMilliseconds()));
This update changes the logic a bit so the deadline is no longer applied.  Is it OK?


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/env_posix.cc@560
PS6, Line 560: fclose
Wrap into RETRY_ON_EINTR() and log in case of other error?


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/env_posix.cc@603
PS6, Line 603: close(fd_);
Wrap into RETRY_ON_EINTR() and log in case of other error?


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/env_posix.cc@1174
PS6, Line 1174: closedir(d);
Wrap into RETRY_ON_EINTR() and log in case of other error?


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/env_posix.cc@1380
PS6, Line 1380:  close(fd)
Wrap into RETRY_ON_EINTR() here as well?


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/env_posix.cc@1392
PS6, Line 1392: PosixFileLock* my_lock = reinterpret_cast<PosixFileLock*>(lock);
nit: maybe, wrap this into unique_ptr<> by the way?


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/env_posix.cc@1397
PS6, Line 1397: close(my_lock->fd_);
Wrap into RETRY_ON_EINTR() and log if case of other error?


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/env_posix.cc@1737
PS6, Line 1737: fts_close(fts)
nit: not a part of this changelist, but just curious, could this ever fail, and if yes, whether we need any handling for that.


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/semaphore.cc
File src/kudu/util/semaphore.cc:

http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/semaphore.cc@a66
PS6, Line 66: 
I hope that was not the desired behavior to return from this method on EINTR error.


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/subprocess.cc
File src/kudu/util/subprocess.cc:

http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/subprocess.cc@384
PS6, Line 384:  == STDIN_FILENO
typo?


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/subprocess.cc@385
PS6, Line 385: dup2_ret)
dup2_ret == STDIN_FILENO ?


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/subprocess.cc@394
PS6, Line 394:  == STDOUT_FILENO
ditto


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/subprocess.cc@395
PS6, Line 395: dup2_ret
ditto


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/subprocess.cc@411
PS6, Line 411:  == STDERR_FILENO
ditto


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/subprocess.cc@412
PS6, Line 412: dup2_ret
ditto



-- 
To view, visit http://gerrit.cloudera.org:8080/10435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
Gerrit-Change-Number: 10435
Gerrit-PatchSet: 6
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 22 May 2018 22:19:47 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2427: retry more system calls on EINTR

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10435 )

Change subject: KUDU-2427: retry more system calls on EINTR
......................................................................


Patch Set 3:

(1 comment)

Seem like we might be missing a few more:
- Socket::Close
- Log::IndexChunk::~IndexChunk
- a few things in sysinfo.cc (open and close calls)
- os-util.cc DisableCoreDumps
- various close() and open() calls in subprocess.cc

http://gerrit.cloudera.org:8080/#/c/10435/3/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

http://gerrit.cloudera.org:8080/#/c/10435/3/src/kudu/util/env_posix.cc@300
PS3, Line 300:     RETRY_ON_EINTR(err, ::close(fd_));
I think this might be tracked by KUDU-2151? worth adding it to the commit message and closing the jira when committed



-- 
To view, visit http://gerrit.cloudera.org:8080/10435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
Gerrit-Change-Number: 10435
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 17 May 2018 17:47:00 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2427: retry more system calls on EINTR

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/10435 )

Change subject: KUDU-2427: retry more system calls on EINTR
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10435/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10435/5//COMMIT_MSG@9
PS5, Line 9: In order to collect our own stack traces, we send ourselves a SIGUSR2. The
> It's worth noting that for the SIGUSR2 case we set SA_RESTART on our signal
Good question. According to signal(7), not all syscalls are restarted. There's a length list of system calls that will not be restarted, and the restartability of some I/O-related syscalls depends on the properties of the file backing the fd.

In my testing I observed fallocate(2) returning EINTR, which isn't documented in that list.



-- 
To view, visit http://gerrit.cloudera.org:8080/10435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
Gerrit-Change-Number: 10435
Gerrit-PatchSet: 5
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 18 May 2018 03:08:52 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2427: retry more system calls on EINTR

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10435 )

Change subject: KUDU-2427: retry more system calls on EINTR
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10435/3/src/kudu/util/net/socket.cc
File src/kudu/util/net/socket.cc:

http://gerrit.cloudera.org:8080/#/c/10435/3/src/kudu/util/net/socket.cc@465
PS3, Line 465:       // Continue silently when the syscall is interrupted.
             :       if (s.posix_code() == EINTR) {
             :         continue;
> Right, but take a look at tls_socket.cc: I changed TlsSocket's Write and Re
ah... the RETRY_ON_EINTR around SSL_read is actually a little concerning. I remember Alexey and I spent many hours trying to understand the various return types from that function. I seem to recall there were cases where it returns 0, and then SSL_get_error returns SSL_ERROR_WANT_READ, and errno is set to EINTR. ie SSL_read does not conform to the normal syscall interface that any error is indicated by a negative return.

If I recall correctly the behavior we saw wasn't documented properly in the man pages, etc.



-- 
To view, visit http://gerrit.cloudera.org:8080/10435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
Gerrit-Change-Number: 10435
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 21 May 2018 17:39:03 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2427: retry more system calls on EINTR

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10435

to look at the new patch set (#5).

Change subject: KUDU-2427: retry more system calls on EINTR
......................................................................

KUDU-2427: retry more system calls on EINTR

In order to collect our own stack traces, we send ourselves a SIGUSR2. The
diagnostics log initiates collection every 60s, as do service queue
overflows. As such, it's really important that we retry every interruptible
system call rather than passing the EINTR up the call stack as a failure.

For whatever reason this happens more frequently on Ubuntu 18.04, though
maybe it's because I've placed my test directory on tmpfs. For example, I
can easily repro a crash due to non-existent retry with the following
command line:

  bin/tablet_server-test --gtest_repeat=1000 --gtest_throw_on_failure \
    --diagnostics_log_stack_traces_interval_ms=100 \
    --unlock_experimental_flags --gtest_filter=*KUDU_177

This patch also fixes KUDU-2151.

Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
---
M src/kudu/consensus/log_index.cc
M src/kudu/gutil/macros.h
M src/kudu/gutil/sysinfo.cc
M src/kudu/rpc/negotiation.cc
M src/kudu/security/tls_socket.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/util/env_posix.cc
M src/kudu/util/net/socket.cc
M src/kudu/util/os-util.cc
M src/kudu/util/os-util.h
M src/kudu/util/semaphore.cc
M src/kudu/util/subprocess.cc
12 files changed, 149 insertions(+), 117 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/35/10435/5
-- 
To view, visit http://gerrit.cloudera.org:8080/10435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
Gerrit-Change-Number: 10435
Gerrit-PatchSet: 5
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2427: retry more system calls on EINTR

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10435 )

Change subject: KUDU-2427: retry more system calls on EINTR
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10435/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10435/5//COMMIT_MSG@9
PS5, Line 9: In order to collect our own stack traces, we send ourselves a SIGUSR2. The
It's worth noting that for the SIGUSR2 case we set SA_RESTART on our signal handler, so in theory the system should already automatically restart all of our syscalls for us for that signal. That also explains why we haven't seen this be a big problem as of yet. Do we have any alternate explanation?



-- 
To view, visit http://gerrit.cloudera.org:8080/10435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
Gerrit-Change-Number: 10435
Gerrit-PatchSet: 5
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 18 May 2018 02:57:03 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2427: retry more system calls on EINTR

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10435 )

Change subject: KUDU-2427: retry more system calls on EINTR
......................................................................

KUDU-2427: retry more system calls on EINTR

In order to collect its own stack traces, Kudu periodically sends itself a
SIGUSR2. The diagnostics log initiates stack collection every 60s, as do
some service queue overflow events. In theory, the collection shouldn't
affect any ongoing syscalls because the SIGUSR2 signal handler is installed
with SA_RESTART; in practice, not all syscalls are restartable, and
precisely categorizing those that are and those that aren't is difficult. As
such, it's really important that we retry every interruptible syscall rather
than surfacing the EINTR up the call stack as a failure.

For whatever reason this happens more frequently on Ubuntu 18.04, though
maybe it's because I've placed my test directory on tmpfs. For example, I
can easily repro a crash due to non-existent retry with the following
command line:

  bin/tablet_server-test --gtest_repeat=1000 --gtest_throw_on_failure \
    --diagnostics_log_stack_traces_interval_ms=100 \
    --unlock_experimental_flags --gtest_filter=*KUDU_177

This patch also fixes KUDU-2151.

Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
Reviewed-on: http://gerrit.cloudera.org:8080/10435
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <as...@cloudera.com>
---
M src/kudu/consensus/log_index.cc
M src/kudu/gutil/macros.h
M src/kudu/gutil/sysinfo.cc
M src/kudu/tools/tool_action_common.cc
M src/kudu/tools/tool_action_test.cc
M src/kudu/util/env-test.cc
M src/kudu/util/env_posix.cc
M src/kudu/util/net/socket.cc
M src/kudu/util/os-util.cc
M src/kudu/util/os-util.h
M src/kudu/util/pstack_watcher-test.cc
M src/kudu/util/pstack_watcher.cc
M src/kudu/util/semaphore.cc
M src/kudu/util/subprocess-test.cc
M src/kudu/util/subprocess.cc
15 files changed, 253 insertions(+), 138 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Alexey Serbin: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/10435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
Gerrit-Change-Number: 10435
Gerrit-PatchSet: 8
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2427: retry more system calls on EINTR

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/10435 )

Change subject: KUDU-2427: retry more system calls on EINTR
......................................................................


Patch Set 6:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/consensus/log_index.cc
File src/kudu/consensus/log_index.cc:

http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/consensus/log_index.cc@116
PS6, Line 116: RETRY_ON_EINTR(ret, close(fd_));
> It's highly unlikely, but is it worth logging about an error?
Done


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/gutil/sysinfo.cc
File src/kudu/gutil/sysinfo.cc:

http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/gutil/sysinfo.cc@301
PS6, Line 301: RETRY_ON_EINTR(ret, close(fd));
> Is it worth logging about a failure (if it not EINTR, of course)?
Done


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/rpc/negotiation.cc
File src/kudu/rpc/negotiation.cc:

http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/rpc/negotiation.cc@119
PS6, Line 119:     RETRY_ON_EINTR(ready, ppoll(&poll_fd, 1, &ts, nullptr));
             : #else
             :     RETRY_ON_EINTR(ready, poll(&poll_fd, 1, remaining.ToMilliseconds()));
> This update changes the logic a bit so the deadline is no longer applied.  
You're right; this is a mistake. I'll revert this back to the way it was, with EINTR handled in the outer loop rather than inside the RETRY_ON_EINTR inner loop.


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/env_posix.cc@560
PS6, Line 560: fclose
> Wrap into RETRY_ON_EINTR() and log in case of other error?
Done


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/env_posix.cc@603
PS6, Line 603: close(fd_);
> Wrap into RETRY_ON_EINTR() and log in case of other error?
Done


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/env_posix.cc@1174
PS6, Line 1174: closedir(d);
> Wrap into RETRY_ON_EINTR() and log in case of other error?
The closedir() manpage doesn't mention returning EINTR, or returning other errors set by close() either.


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/env_posix.cc@1380
PS6, Line 1380:  close(fd)
> Wrap into RETRY_ON_EINTR() here as well?
Done


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/env_posix.cc@1392
PS6, Line 1392: PosixFileLock* my_lock = reinterpret_cast<PosixFileLock*>(lock);
> nit: maybe, wrap this into unique_ptr<> by the way?
Done


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/env_posix.cc@1397
PS6, Line 1397: close(my_lock->fd_);
> Wrap into RETRY_ON_EINTR() and log if case of other error?
Done


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/env_posix.cc@1737
PS6, Line 1737: fts_close(fts)
> nit: not a part of this changelist, but just curious, could this ever fail,
The fts manpage says:

  ERRORS
       The function fts_open() may fail and set errno for any of the errors specified for open(2) and malloc(3).

       The function fts_close() may fail and set errno for any of the errors specified for chdir(2) and close(2).

       The  functions  fts_read()  and  fts_children()  may  fail  and  set errno for any of the errors specified for
       chdir(2), malloc(3), opendir(3), readdir(3), and stat(2).

So yes, we should wrap fts_open and fts_close.


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/semaphore.cc
File src/kudu/util/semaphore.cc:

http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/semaphore.cc@a66
PS6, Line 66: 
> I hope that was not the desired behavior to return from this method on EINT
Todd and I discussed that on Slack. Returning false on EINTR is not necessarily incorrect, but it leads to more TryAcquire failures than are strictly necessary. It seems to have been this way from day one. We agreed that changing it should be OK.


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/subprocess.cc
File src/kudu/util/subprocess.cc:

http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/subprocess.cc@384
PS6, Line 384:  == STDIN_FILENO
> typo?
Oops, good catch. Done.


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/subprocess.cc@385
PS6, Line 385: dup2_ret)
> dup2_ret == STDIN_FILENO ?
Done


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/subprocess.cc@394
PS6, Line 394:  == STDOUT_FILENO
> ditto
Done


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/subprocess.cc@395
PS6, Line 395: dup2_ret
> ditto
Done


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/subprocess.cc@411
PS6, Line 411:  == STDERR_FILENO
> ditto
Done


http://gerrit.cloudera.org:8080/#/c/10435/6/src/kudu/util/subprocess.cc@412
PS6, Line 412: dup2_ret
> ditto
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/10435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6cce03c4e1b2be32c1910382737526082fc99966
Gerrit-Change-Number: 10435
Gerrit-PatchSet: 6
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 23 May 2018 19:04:25 +0000
Gerrit-HasComments: Yes