You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Kuien Liu (JIRA)" <ji...@apache.org> on 2017/09/25 05:09:02 UTC

[jira] [Commented] (HAWQ-1529) "segment resource manager" will NOT exit when postmaster died

    [ https://issues.apache.org/jira/browse/HAWQ-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16178535#comment-16178535 ] 

Kuien Liu commented on HAWQ-1529:
---------------------------------

After I kill -9 postmaster @ sgement, the segment log prints following every 30 seconds:


2017-09-25 13:06:30.843258 CST,,,p97782,th345595264,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager discovered local host IPv4 address 127.0.0.1",,,,,,,0,,"network_utils.c",210,
2017-09-25 13:06:30.843307 CST,,,p97782,th345595264,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager discovered local host IPv4 address 10.101.212.74",,,,,,,0,,"network_utils.c",210,
2017-09-25 13:06:30.843327 CST,,,p97782,th345595264,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager discovered local host IPv4 address 192.16.0.1",,,,,,,0,,"network_utils.c",210,


2017-09-25 13:07:00.876002 CST,,,p97782,th345595264,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager discovered local host IPv4 address 127.0.0.1",,,,,,,0,,"network_utils.c",210,
2017-09-25 13:07:00.876039 CST,,,p97782,th345595264,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager discovered local host IPv4 address 10.101.212.74",,,,,,,0,,"network_utils.c",210,
2017-09-25 13:07:00.876051 CST,,,p97782,th345595264,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager discovered local host IPv4 address 192.16.0.1",,,,,,,0,,"network_utils.c",210,



> "segment resource manager" will NOT exit when postmaster died
> -------------------------------------------------------------
>
>                 Key: HAWQ-1529
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1529
>             Project: Apache HAWQ
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Kuien Liu
>            Assignee: Radar Lei
>
> If I send SIGKILL to postmaster of segment by 'kill -9', then postmaster dies, BUT "segment resource manager" and "logger process" are still alive and flushing "WARNING" each 30s.
> To my understanding, "logger process" is waiting for "segment resource manager", but the resource manager will not detect the alive-status of postmaster and continue waiting. Does it make sense? Why not quit in case of postmaster gone? 
> The call stack of RM when postmaster is killed:
> #0  0x00007f19023ccab6 in poll () from /lib64/libc.so.6
> #1  0x0000000000a48c9e in processAllCommFileDescs () at rmcomm_AsyncComm.c:156
> #2  0x0000000000a8ce5e in MainHandlerLoop_RMSEG () at resourcemanager_RMSEG.c:166
> #3  0x0000000000a8cba3 in ResManagerMainSegment2ndPhase () at resourcemanager_RMSEG.c:71
> #4  0x0000000000a8d966 in ResManagerMain (argc=0x3, argv=0x7fffa018b890) at resourcemanager.c:346
> #5  0x0000000000a8db45 in ResManagerProcessStartup () at resourcemanager.c:411
> #6  0x0000000000899b89 in CommenceNormalOperations () at postmaster.c:3673
> #7  0x000000000089a562 in do_reaper () at postmaster.c:4021
> #8  0x00000000008969bb in ServerLoop () at postmaster.c:2136
> #9  0x0000000000895a78 in PostmasterMain (argc=0xc, argv=0x229a730) at postmaster.c:1454
> #10 0x00000000007b185d in main (argc=0xc, argv=0x229a730) at main.c:226
> #11 0x00007f190231e994 in __libc_start_main () from /lib64/libc.so.6
> #12 0x00000000004bde89 in _start ()



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)