You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Kuien Liu (JIRA)" <ji...@apache.org> on 2017/09/25 06:53:02 UTC
[jira] [Comment Edited] (HAWQ-1529) "segment resource manager" will
NOT exit when postmaster died
[ https://issues.apache.org/jira/browse/HAWQ-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16178608#comment-16178608 ]
Kuien Liu edited comment on HAWQ-1529 at 9/25/17 6:52 AM:
----------------------------------------------------------
A possible patch looks strange but does work.
{code:diff}
--- a/src/backend/resourcemanager/resourcemanager_RMSEG.c
+++ b/src/backend/resourcemanager/resourcemanager_RMSEG.c
@@ -26,6 +26,7 @@
#include "communication/rmcomm_MessageServer.h"
#include "communication/rmcomm_RMSEG2RM.h"
#include "resourceenforcer/resourceenforcer.h"
+#include "storage/pmsignal.h" /* PostmasterIsAlive */
#include "cdb/cdbtmpdir.h"
int ResManagerMainSegment2ndPhase(void)
@@ -156,7 +157,7 @@ int MainHandlerLoop_RMSEG(void)
DRMGlobalInstance->ResourceManagerStartTime = gettime_microsec();
while( DRMGlobalInstance->ResManagerMainKeepRun ) {
- if (!PostmasterIsAlive(true)) {
+ if (0 == PostmasterIsAlive(true)) {
DRMGlobalInstance->ResManagerMainKeepRun = false;
elog(LOG, "Postmaster is not alive, resource manager exits");
break;
{code}
was (Author: kuien):
A possible patch looks strange but does work.
{code:c}
--- a/src/backend/resourcemanager/resourcemanager_RMSEG.c
+++ b/src/backend/resourcemanager/resourcemanager_RMSEG.c
@@ -26,6 +26,7 @@
#include "communication/rmcomm_MessageServer.h"
#include "communication/rmcomm_RMSEG2RM.h"
#include "resourceenforcer/resourceenforcer.h"
+#include "storage/pmsignal.h" /* PostmasterIsAlive */
#include "cdb/cdbtmpdir.h"
int ResManagerMainSegment2ndPhase(void)
@@ -156,7 +157,7 @@ int MainHandlerLoop_RMSEG(void)
DRMGlobalInstance->ResourceManagerStartTime = gettime_microsec();
while( DRMGlobalInstance->ResManagerMainKeepRun ) {
- if (!PostmasterIsAlive(true)) {
+ if (0 == PostmasterIsAlive(true)) {
DRMGlobalInstance->ResManagerMainKeepRun = false;
elog(LOG, "Postmaster is not alive, resource manager exits");
break;
{code}
> "segment resource manager" will NOT exit when postmaster died
> -------------------------------------------------------------
>
> Key: HAWQ-1529
> URL: https://issues.apache.org/jira/browse/HAWQ-1529
> Project: Apache HAWQ
> Issue Type: Improvement
> Components: Core
> Reporter: Kuien Liu
> Assignee: Radar Lei
>
> If I send SIGKILL to postmaster of segment by 'kill -9', then postmaster dies, BUT "segment resource manager" and "logger process" are still alive and flushing "WARNING" each 30s.
> To my understanding, "logger process" is waiting for "segment resource manager", but the resource manager will not detect the alive-status of postmaster and continue waiting. Does it make sense? Why not quit in case of postmaster gone?
> The call stack of RM when postmaster is killed:
> #0 0x00007f19023ccab6 in poll () from /lib64/libc.so.6
> #1 0x0000000000a48c9e in processAllCommFileDescs () at rmcomm_AsyncComm.c:156
> #2 0x0000000000a8ce5e in MainHandlerLoop_RMSEG () at resourcemanager_RMSEG.c:166
> #3 0x0000000000a8cba3 in ResManagerMainSegment2ndPhase () at resourcemanager_RMSEG.c:71
> #4 0x0000000000a8d966 in ResManagerMain (argc=0x3, argv=0x7fffa018b890) at resourcemanager.c:346
> #5 0x0000000000a8db45 in ResManagerProcessStartup () at resourcemanager.c:411
> #6 0x0000000000899b89 in CommenceNormalOperations () at postmaster.c:3673
> #7 0x000000000089a562 in do_reaper () at postmaster.c:4021
> #8 0x00000000008969bb in ServerLoop () at postmaster.c:2136
> #9 0x0000000000895a78 in PostmasterMain (argc=0xc, argv=0x229a730) at postmaster.c:1454
> #10 0x00000000007b185d in main (argc=0xc, argv=0x229a730) at main.c:226
> #11 0x00007f190231e994 in __libc_start_main () from /lib64/libc.so.6
> #12 0x00000000004bde89 in _start ()
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)