You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Philip Zeyliger (Code Review)" <ge...@cloudera.org> on 2018/09/20 04:12:29 UTC

[Impala-ASF-CR] Workaround docker/kernel bug causing test-with-docker to sometimes hang.

Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11481


Change subject: Workaround docker/kernel bug causing test-with-docker to sometimes hang.
......................................................................

Workaround docker/kernel bug causing test-with-docker to sometimes hang.

I've observed that builds of test-with-docker that have "suite
parallelism" sometimes hang when the Docker containers are
being created. (The implementation had multiple threads calling
"docker create" simultaneously.) Trolling the mailing lists,
it's maybe a bug in Docker or the kernel. I've never caught
it live enough to strace it.

A hopeful workaround is to serialize the docker create calls, which is
easy and harmless, given that "docker create" is usually pretty quick
(subsecond) and the overall run time here is hours+.

With this change, I was able to run test-with-docker with
--suite-concurrency=6 on a c5.8xlarge in AWS, with a total runtime of
1h35m.

The hangs are intermittent and cause, in the typical case,
inconsistency in runtimes because less parallelism happens
when one of the "docker create" calls hang. (I've seen
them resume after one of the other containers finishes.)
We'll find out with time whether this stabilizes it or
has no effect.

Change-Id: I3e44db7a6ce08a42d6fe574d7348332578cd9e51
---
M docker/test-with-docker.py
1 file changed, 47 insertions(+), 40 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/11481/1
-- 
To view, visit http://gerrit.cloudera.org:8080/11481
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3e44db7a6ce08a42d6fe574d7348332578cd9e51
Gerrit-Change-Number: 11481
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>

[Impala-ASF-CR] IMPALA-7624: Workaround docker/kernel bug causing test-with-docker to sometimes hang.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11481 )

Change subject: IMPALA-7624: Workaround docker/kernel bug causing test-with-docker to sometimes hang.
......................................................................


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3220/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/11481
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3e44db7a6ce08a42d6fe574d7348332578cd9e51
Gerrit-Change-Number: 11481
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Sep 2018 22:40:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] Workaround docker/kernel bug causing test-with-docker to sometimes hang.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11481 )

Change subject: Workaround docker/kernel bug causing test-with-docker to sometimes hang.
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/729/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11481
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3e44db7a6ce08a42d6fe574d7348332578cd9e51
Gerrit-Change-Number: 11481
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Comment-Date: Thu, 20 Sep 2018 04:49:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7624: Workaround docker/kernel bug causing test-with-docker to sometimes hang.

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/11481 )

Change subject: IMPALA-7624: Workaround docker/kernel bug causing test-with-docker to sometimes hang.
......................................................................


Patch Set 2: Code-Review+2

Carrying the +1's and adding them up into a +2.


-- 
To view, visit http://gerrit.cloudera.org:8080/11481
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3e44db7a6ce08a42d6fe574d7348332578cd9e51
Gerrit-Change-Number: 11481
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Sep 2018 22:40:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7624: Workaround docker/kernel bug causing test-with-docker to sometimes hang.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11481 )

Change subject: IMPALA-7624: Workaround docker/kernel bug causing test-with-docker to sometimes hang.
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/799/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/11481
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3e44db7a6ce08a42d6fe574d7348332578cd9e51
Gerrit-Change-Number: 11481
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Sep 2018 23:35:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7624: Workaround docker/kernel bug causing test-with-docker to sometimes hang.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11481 )

Change subject: IMPALA-7624: Workaround docker/kernel bug causing test-with-docker to sometimes hang.
......................................................................

IMPALA-7624: Workaround docker/kernel bug causing test-with-docker to sometimes hang.

I've observed that builds of test-with-docker that have "suite
parallelism" sometimes hang when the Docker containers are
being created. (The implementation had multiple threads calling
"docker create" simultaneously.) Trolling the mailing lists,
it's maybe a bug in Docker or the kernel. I've never caught
it live enough to strace it.

A hopeful workaround is to serialize the docker create calls, which is
easy and harmless, given that "docker create" is usually pretty quick
(subsecond) and the overall run time here is hours+.

With this change, I was able to run test-with-docker with
--suite-concurrency=6 on a c5.9xlarge in AWS, with a total runtime of
1h35m.

The hangs are intermittent and cause, in the typical case, inconsistency
in runtimes because less parallelism happens when one of the "docker
create" calls hang. (I've seen them resume after one of the other
containers finishes.) We'll find out with time whether this stabilizes
it or has no effect.

Change-Id: I3e44db7a6ce08a42d6fe574d7348332578cd9e51
Reviewed-on: http://gerrit.cloudera.org:8080/11481
Reviewed-by: Philip Zeyliger <ph...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M docker/test-with-docker.py
1 file changed, 47 insertions(+), 40 deletions(-)

Approvals:
  Philip Zeyliger: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/11481
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I3e44db7a6ce08a42d6fe574d7348332578cd9e51
Gerrit-Change-Number: 11481
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>

[Impala-ASF-CR] Workaround docker/kernel bug causing test-with-docker to sometimes hang.

Posted by "Laszlo Gaal (Code Review)" <ge...@cloudera.org>.
Laszlo Gaal has posted comments on this change. ( http://gerrit.cloudera.org:8080/11481 )

Change subject: Workaround docker/kernel bug causing test-with-docker to sometimes hang.
......................................................................


Patch Set 1: Code-Review+1

(1 comment)

LGTM, feel free to self-promote to +2

http://gerrit.cloudera.org:8080/#/c/11481/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11481/1//COMMIT_MSG@21
PS1, Line 21: c5.8xlarge
micro-nit: there is only c5.9xlarge and c4.8xlarge



-- 
To view, visit http://gerrit.cloudera.org:8080/11481
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3e44db7a6ce08a42d6fe574d7348332578cd9e51
Gerrit-Change-Number: 11481
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Comment-Date: Thu, 20 Sep 2018 15:27:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7624: Workaround docker/kernel bug causing test-with-docker to sometimes hang.

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Hello Laszlo Gaal, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11481

to look at the new patch set (#2).

Change subject: IMPALA-7624: Workaround docker/kernel bug causing test-with-docker to sometimes hang.
......................................................................

IMPALA-7624: Workaround docker/kernel bug causing test-with-docker to sometimes hang.

I've observed that builds of test-with-docker that have "suite
parallelism" sometimes hang when the Docker containers are
being created. (The implementation had multiple threads calling
"docker create" simultaneously.) Trolling the mailing lists,
it's maybe a bug in Docker or the kernel. I've never caught
it live enough to strace it.

A hopeful workaround is to serialize the docker create calls, which is
easy and harmless, given that "docker create" is usually pretty quick
(subsecond) and the overall run time here is hours+.

With this change, I was able to run test-with-docker with
--suite-concurrency=6 on a c5.9xlarge in AWS, with a total runtime of
1h35m.

The hangs are intermittent and cause, in the typical case, inconsistency
in runtimes because less parallelism happens when one of the "docker
create" calls hang. (I've seen them resume after one of the other
containers finishes.) We'll find out with time whether this stabilizes
it or has no effect.

Change-Id: I3e44db7a6ce08a42d6fe574d7348332578cd9e51
---
M docker/test-with-docker.py
1 file changed, 47 insertions(+), 40 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/11481/2
-- 
To view, visit http://gerrit.cloudera.org:8080/11481
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3e44db7a6ce08a42d6fe574d7348332578cd9e51
Gerrit-Change-Number: 11481
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>

[Impala-ASF-CR] IMPALA-7624: Workaround docker/kernel bug causing test-with-docker to sometimes hang.

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/11481 )

Change subject: IMPALA-7624: Workaround docker/kernel bug causing test-with-docker to sometimes hang.
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11481/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11481/1//COMMIT_MSG@21
PS1, Line 21: c5.8xlarge
> micro-nit: there is only c5.9xlarge and c4.8xlarge
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11481
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3e44db7a6ce08a42d6fe574d7348332578cd9e51
Gerrit-Change-Number: 11481
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Sep 2018 22:39:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] Workaround docker/kernel bug causing test-with-docker to sometimes hang.

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/11481 )

Change subject: Workaround docker/kernel bug causing test-with-docker to sometimes hang.
......................................................................


Patch Set 1: Code-Review+1

lgtm
Maybe creating a Jira + mentioning it in the commit message would make it easier to track this in the future.


-- 
To view, visit http://gerrit.cloudera.org:8080/11481
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3e44db7a6ce08a42d6fe574d7348332578cd9e51
Gerrit-Change-Number: 11481
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Comment-Date: Thu, 20 Sep 2018 11:08:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7624: Workaround docker/kernel bug causing test-with-docker to sometimes hang.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/11481 )

Change subject: IMPALA-7624: Workaround docker/kernel bug causing test-with-docker to sometimes hang.
......................................................................


Patch Set 2: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/11481
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3e44db7a6ce08a42d6fe574d7348332578cd9e51
Gerrit-Change-Number: 11481
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Laszlo Gaal <la...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Comment-Date: Wed, 26 Sep 2018 02:20:44 +0000
Gerrit-HasComments: No