You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hawq.apache.org by xi...@iluvatar.ai on 2018/10/30 09:48:52 UTC

why stop cluster can not stop gpsyncmaster

Hi! 

After i call "hawq stop cluster -a", i found that there is still has gpadmin process: 

gpadmin 61866 0.4 5.2 811448 419620 ? S 17:29 0:00 /usr/local/apache-hawq/bin/gpsyncmaster -D /data/hawq/masterdd -i -p 1809 
gpadmin 61882 0.0 0.0 302688 7200 ? Ss 17:29 0:00 postgres: port 1809, logger process 
gpadmin 61883 0.0 0.0 812000 7384 ? S 17:29 0:00 postgres: port 1809, WAL Redo Server process 
gpadmin 61907 0.0 0.1 812300 8128 ? Ss 17:29 0:00 postgres: port 1809, gpsyncagent process con2 idle 

Then I call "hawq start cluster -a" failed: 

20181030:17:29:05:2789929 hawq_start:dx-computing:gpadmin-[INFO]:-Starting standby master '192.168.10.18' 
20181030:17:29:05:2789929 hawq_start:dx-computing:gpadmin-[INFO]:-Start standby master service 
20181030:17:29:04:061879 hawqstandbywatch.py:dx-computing2:gpadmin-[INFO]:-Checking standby master status 
20181030:17:29:04:061879 hawqstandbywatch.py:dx-computing2:gpadmin-[INFO]:-Monitoring logs 
20181030:17:29:08:061879 hawqstandbywatch.py:dx-computing2:gpadmin-[INFO]:-checking if syncmaster is running 
20181030:17:29:08:061879 hawqstandbywatch.py:dx-computing2:gpadmin-[INFO]:-syncmaster appears ok, pid 61866 
20181030:17:29:09:2789929 hawq_start:dx-computing:gpadmin-[INFO]:-Standby master started successfully 
20181030:17:29:09:2789929 hawq_start:dx-computing:gpadmin-[INFO]:-Starting master node '192.168.10.17' 
20181030:17:29:09:2789929 hawq_start:dx-computing:gpadmin-[INFO]:-Start master service 
20181030:17:29:10:2789929 hawq_start:dx-computing:gpadmin-[INFO]:-Checking if standby is synced with master 
20181030:17:29:10:2789929 hawq_start:dx-computing:gpadmin-[ERROR]:-Failed to connect to database, this script can only be run when the database is up 
Traceback (most recent call last): 
File "/usr/local/apache-hawq/bin/hawq_ctl", line 1459, in <module> 
start_hawq(opts, hawq_dict) 
File "/usr/local/apache-hawq/bin/hawq_ctl", line 1233, in start_hawq 
instance.run() 
File "/usr/local/apache-hawq/bin/hawq_ctl", line 765, in run 
check_return_code(self._start_all_nodes()) 
File "/usr/local/apache-hawq/bin/hawq_ctl", line 701, in _start_all_nodes 
check_return_code(self.start_master(), logger, "Master start failed, exit", \ 
File "/usr/local/apache-hawq/bin/hawq_ctl", line 618, in start_master 
sync_result = self._check_standby_sync() 
File "/usr/local/apache-hawq/bin/hawq_ctl", line 671, in _check_standby_sync 
for row in rows: 
UnboundLocalError: local variable 'rows' referenced before assignment 

So, why stop cluster can not stop gpsyncmaster on standby node? 

I use hawq 2.2, upgrade can solve it?