You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Radar Lei (JIRA)" <ji...@apache.org> on 2016/07/09 04:06:10 UTC

[jira] [Commented] (HAWQ-901) hawq init failed: hawqstandbywatch.py:test5:gpadmin-[WARNING]:-syncmaster not running

    [ https://issues.apache.org/jira/browse/HAWQ-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368924#comment-15368924 ] 

Radar Lei commented on HAWQ-901:
--------------------------------

Since recently standby changes, in some case standby start might take more time, this caused standby start failed of timeout. The timeout current is too small, I plan to add retry loop to fix it.  

> hawq init failed: hawqstandbywatch.py:test5:gpadmin-[WARNING]:-syncmaster not running
> -------------------------------------------------------------------------------------
>
>                 Key: HAWQ-901
>                 URL: https://issues.apache.org/jira/browse/HAWQ-901
>             Project: Apache HAWQ
>          Issue Type: Bug
>          Components: Command Line Tools
>            Reporter: Ming LI
>            Assignee: Radar Lei
>
> Error message in ~/hawqAdminLogs/hawq_init_XXXXXXXX.log
> ------------------------------------------------------------------------------------
> 20160706:06:45:53:006218 hawq_start:test1:gpadmin-[INFO]:-Start hawq with args: ['start', 'standby']
> 20160706:06:45:53:006218 hawq_start:test1:gpadmin-[INFO]:-Gathering information and validating the environment...
> 20160706:06:45:53:006218 hawq_start:test1:gpadmin-[INFO]:-Start standby master service
> 20160706:06:46:02:006218 hawq_start:test1:gpadmin-[INFO]:-Checking standby master status
> 20160706:06:45:55:004418 hawqstandbywatch.py:test5:gpadmin-[INFO]:-Monitoring logs
> 20160706:06:46:00:004418 hawqstandbywatch.py:test5:gpadmin-[INFO]:-checking if syncmaster is running
> 20160706:06:46:02:004418 hawqstandbywatch.py:test5:gpadmin-[WARNING]:-syncmaster not running
> 20160706:06:46:02:006218 hawq_start:test1:gpadmin-[ERROR]:-Standby master start failed, exit
> 20160706:06:46:02:003999 hawqinit.sh:test5:gpadmin-[ERROR]:-Start HAWQ standby failed
> ------------------------------------------------------------------------------
> (1) I suspect the root cause maybe: we only wait 5 seconds before we check standby running status, this interval is too small.  Could you please firstly change the standby running status check interval from 5 seconds to a loop like recovery running status check on master? 
> (2) If the error 'syncmaster not running' will lead to init failure, we should change from [WARNING] to [ERROR]. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)