You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Lars Volker (JIRA)" <ji...@apache.org> on 2017/05/17 15:22:04 UTC

[jira] [Resolved] (IMPALA-5208) Forked breakpad process blocks indefinitely for WaitForContinueSignal and fails new Impalad process at startup

     [ https://issues.apache.org/jira/browse/IMPALA-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Volker resolved IMPALA-5208.
---------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.9.0

IMPALA-5187, IMPALA-5208: Bump Breakpad Version, undo IMPALA-3794

This change switches to a new Breakpad version, which includes fixes for
Breakpad bugs #681 and #728. The toolchain change was reviewed here:
https://gerrit.cloudera.org/6866

The change also undoes the workaround introduced in IMPALA-3794.

In addition to running test_breakpad.py in a loop for a while, I tested
Then I verified that the test fails with the old toolchain version
(88e5b2) and works with the new one (ffe3e4).

To test #728 I added a sleep() call before SendContinueSignalToChild()
and then killed the parent process, manually observing that the child
would die, too.

Change-Id: Ic541ccd565f2bb51f68c085747fc47ae8c905d19
Reviewed-on: http://gerrit.cloudera.org:8080/6883
Reviewed-by: Lars Volker <lv...@cloudera.com>
Tested-by: Impala Public Jenkins

> Forked breakpad process blocks indefinitely for WaitForContinueSignal and fails new Impalad process at startup 
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-5208
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5208
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.7.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Lars Volker
>            Priority: Critical
>             Fix For: Impala 2.9.0
>
>
> New Impala process failing to start 
> {code}
> E0414 10:17:56.761270 893048 logging.cc:121] stderr will be logged to this file.
> E0414 10:17:59.897265 893215 thrift-server.cc:182] ThriftServer 'backend' (on port: 22000) exited due to TException: Could not bind: Transport endpoint is not connected
> E0414 10:17:59.897356 893048 thrift-server.cc:171] ThriftServer 'backend' (on port: 22000) did not start correctly
> F0414 10:17:59.899677 893048 impalad-main.cc:89] ThriftServer 'backend' (on port: 22000) did not start correctly
> . Impalad exiting.
> {code}
> Call stack from hung breakpad fork
> {code}
> (gdb) bt
> #0  0x0000000001b80c9f in google_breakpad::ExceptionHandler::WaitForContinueSignal() ()
> #1  0x0000000001b80ddd in google_breakpad::ExceptionHandler::ThreadEntry(void*) ()
> #2  0x0000000001b805db in google_breakpad::ExceptionHandler::GenerateDump(google_breakpad::ExceptionHandler::CrashContext*) ()
> #3  0x0000000000000000 in ?? ()
> {code}
> PS output
> {code} 
> ps -e --format='pid ppid pgid user args' | grep impala
>  383619       1  383612 impala   python2.7 /usr/lib64/cmf/agent/build/env/bin/cmf-redactor /usr/lib64/cmf/service/impala/impala.sh impalad impalad_flags false
>  405348  368389  405348 mmokhtar python /opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.14/bin/../lib/impala-shell/impala_shell.py -k
>  852304  852233  852304 mmokhtar python /opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.14/bin/../lib/impala-shell/impala_shell.py
>  872925       1  383612 impala   /opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.14/lib/impala/sbin-retail/impalad --flagfile=/run/cloudera-scm-agent/process/60723-impala-IMPALAD/impala-conf/impalad_flags
>  880656  852233  880656 mmokhtar python /opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.14/bin/../lib/impala-shell/impala_shell.py -k
>  881074  852233  881074 mmokhtar python /opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.14/bin/../lib/impala-shell/impala_shell.py -k -i va1335.halxg.cloudera.com
>  883949  874541  883948 mmokhtar grep --color=auto impala
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)