You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/07/04 02:22:11 UTC

[jira] [Commented] (HAWQ-812) Activate standby master failed after create a new database

    [ https://issues.apache.org/jira/browse/HAWQ-812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360748#comment-15360748 ] 

ASF GitHub Bot commented on HAWQ-812:
-------------------------------------

Github user ztao1987 commented on the issue:

    https://github.com/apache/incubator-hawq/pull/761
  
    +1


> Activate standby master failed after create a new database
> ----------------------------------------------------------
>
>                 Key: HAWQ-812
>                 URL: https://issues.apache.org/jira/browse/HAWQ-812
>             Project: Apache HAWQ
>          Issue Type: Bug
>            Reporter: Chunling Wang
>            Assignee: Lei Chang
>
> Activate standby master failed after create a new database. However, it will success if we do not create a new database even we create a new table and insert data. 
> 1. Create a new database 'gptest'
> {code}
> [gpadmin@test1 ~]$ psql -l
>                  List of databases
>    Name    |  Owner  | Encoding | Access privileges
> -----------+---------+----------+-------------------
>  postgres  | gpadmin | UTF8     |
>  template0 | gpadmin | UTF8     |
>  template1 | gpadmin | UTF8     |
> (3 rows)
> [gpadmin@test1 ~]$ createdb gptest
> [gpadmin@test1 ~]$ psql -l
>                  List of databases
>    Name    |  Owner  | Encoding | Access privileges
> -----------+---------+----------+-------------------
>  gptest    | gpadmin | UTF8     |
>  postgres  | gpadmin | UTF8     |
>  template0 | gpadmin | UTF8     |
>  template1 | gpadmin | UTF8     |
> (4 rows)
> {code}
> 2. Stop HAWQ master
> {code}
> [gpadmin@test1 ~]$ hawq stop master -a
> 20160613:20:13:44:068559 hawq_stop:test1:gpadmin-[INFO]:-Prepare to do 'hawq stop'
> 20160613:20:13:44:068559 hawq_stop:test1:gpadmin-[INFO]:-You can find log in:
> 20160613:20:13:44:068559 hawq_stop:test1:gpadmin-[INFO]:-/home/gpadmin/hawqAdminLogs/hawq_stop_20160613.log
> 20160613:20:13:44:068559 hawq_stop:test1:gpadmin-[INFO]:-GPHOME is set to:
> 20160613:20:13:44:068559 hawq_stop:test1:gpadmin-[INFO]:-/data/pulse-agent-data/HAWQ-main-FeatureTest-opt-mutilnodeparallel-wcl/product/hawq/.
> 20160613:20:13:44:068559 hawq_stop:test1:gpadmin-[INFO]:-Stop hawq with args: ['stop', 'master']
> 20160613:20:13:45:068559 hawq_stop:test1:gpadmin-[INFO]:-There are 0 connections to the database
> 20160613:20:13:45:068559 hawq_stop:test1:gpadmin-[INFO]:-Commencing Master instance shutdown with mode='smart'
> 20160613:20:13:45:068559 hawq_stop:test1:gpadmin-[INFO]:-Master host=test1
> 20160613:20:13:45:068559 hawq_stop:test1:gpadmin-[INFO]:-Stop hawq master
> 20160613:20:13:46:068559 hawq_stop:test1:gpadmin-[INFO]:-Master stopped successfully
> {code}
> 3. Activate standby master
> {code}
> [gpadmin@test1 ~]$ ssh test5 'source /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-mutilnodeparallel-wcl/product/hawq/./greenplum_path.sh; hawq activate standby -a'
> 20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-Prepare to do 'hawq activate'
> 20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-You can find log in:
> 20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-/home/gpadmin/hawqAdminLogs/hawq_activate_20160613.log
> 20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-GPHOME is set to:
> 20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-/data/pulse-agent-data/HAWQ-main-FeatureTest-opt-mutilnodeparallel-wcl/product/hawq/.
> 20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-Activate hawq with args: ['activate', 'standby']
> 20160613:20:14:14:126841 hawq_activate:test5:gpadmin-[INFO]:-Starting to activate standby master 'test5'
> 20160613:20:14:15:126841 hawq_activate:test5:gpadmin-[INFO]:-HAWQ master is not running, skip
> 20160613:20:14:15:126841 hawq_activate:test5:gpadmin-[INFO]:-Stopping all the running segments
> 20160613:20:14:21:126841 hawq_activate:test5:gpadmin-[INFO]:-
> 20160613:20:14:21:126841 hawq_activate:test5:gpadmin-[INFO]:-Stopping running standby
> 20160613:20:14:23:126841 hawq_activate:test5:gpadmin-[INFO]:-Update master host name in hawq-site.xml
> 20160613:20:14:31:126841 hawq_activate:test5:gpadmin-[INFO]:-GUC hawq_master_address_host already exist in hawq-site.xml
> Update it with value: test5
> 20160613:20:14:31:126841 hawq_activate:test5:gpadmin-[INFO]:-Remove current standby from hawq-site.xml
> 20160613:20:14:39:126841 hawq_activate:test5:gpadmin-[INFO]:-Start master in master only mode
> {code}
> It hangs and can not start master. And the master log is following:
> {code}
> 2016-06-13 20:14:40.268022 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","database system was shut down at 2016-06-13 20:02:50 PDT",,,,,,,0,,"xlog.c",6205,
> 2016-06-13 20:14:40.268112 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","found recovery.conf file indicating standby takeover recovery needed",,,,,,,0,,"xlog.c",5485,
> 2016-06-13 20:14:40.268131 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","checkpoint record is at 0/1C75EF0",,,,,,,0,,"xlog.c",6304,
> 2016-06-13 20:14:40.268143 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","redo record is at 0/1C75EF0; undo record is at 0/0; shutdown TRUE",,,,,,,0,,"xlog.c",6338,
> 2016-06-13 20:14:40.268155 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","next transaction ID: 0/1003; next OID: 16508",,,,,,,0,,"xlog.c",6342,
> 2016-06-13 20:14:40.268165 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","next MultiXactId: 1; next MultiXactOffset: 0",,,,,,,0,,"xlog.c",6345,
> 2016-06-13 20:14:40.268176 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","Forcing Crash Recovery for Master Standby takeover",,,,,,,0,,"xlog.c",6389,
> 2016-06-13 20:14:40.268195 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","standby takeover recovery in progress",,,,,,,0,,"xlog.c",6427,
> 2016-06-13 20:14:40.268891 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","redo starts at 0/1C75F40",,,,,,,0,,"xlog.c",6523,
> 2016-06-13 20:14:40.273313 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","record with zero length at 0/2639190",,,,,,,0,,"xlog.c",4110,
> 2016-06-13 20:14:40.273338 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","redo done at 0/2639140",,,,,,,0,,"xlog.c",6560,
> 2016-06-13 20:14:40.273352 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","end of transaction log location is 0/2639190",,,,,,,0,,"xlog.c",6582,
> 2016-06-13 20:14:40.273460 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","standby takeover recovery complete",,,,,,,0,,"xlog.c",5506,
> 2016-06-13 20:14:40.274904 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","Need to Repair global sequence number 600 so use scanned maximum value 749 ('gp_persistent_relfile_node')",,,,,,,0,,"cdbpersistentstore.c",519,
> 2016-06-13 20:14:40.275093 PDT,,,p127518,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","Finished startup pass 1.  Proceeding to startup crash recovery passes 2 and 3.",,,,,,,0,,"xlog.c",6816,
> 2016-06-13 20:14:40.284820 PDT,,,p127519,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","Finished startup crash recovery pass 2",,,,,,,0,,"xlog.c",6987,
> 2016-06-13 20:14:40.289053 PDT,,,p127520,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","recovery restart point at 0/1C75F40",,,,,"xlog redo checkpoint: redo 0/1C75F40; undo 0/0; tli 1; xid 0/1003; oid 16508; multi 1; offset 0; shutdown
> REDO PASS 3 @ 0/1C75F40; LSN 0/1C75F90: prev 0/1C75EF0; xid 0: XLOG - checkpoint: redo 0/1C75F40; undo 0/0; tli 1; xid 0/1003; oid 16508; multi 1; offset 0; shutdown",,0,,"xlog.c",8323,
> 2016-06-13 20:14:40.291597 PDT,,,p127520,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","recovery restart point at 0/1C763A0",,,,,"xlog redo checkpoint: redo 0/1C763A0; undo 0/0; tli 1; xid 0/1021; oid 16508; multi 1; offset 0; shutdown
> REDO PASS 3 @ 0/1C763A0; LSN 0/1C763F0: prev 0/1C76370; xid 0: XLOG - checkpoint: redo 0/1C763A0; undo 0/0; tli 1; xid 0/1021; oid 16508; multi 1; offset 0; shutdown",,0,,"xlog.c",8323,
> 2016-06-13 20:14:40.292625 PDT,,,p127520,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","recovery restart point at 0/1C763F0",,,,,"xlog redo checkpoint: redo 0/1C763F0; undo 0/0; tli 1; xid 0/1021; oid 16508; multi 1; offset 0; shutdown
> REDO PASS 3 @ 0/1C763F0; LSN 0/1C76440: prev 0/1C763A0; xid 0: XLOG - checkpoint: redo 0/1C763F0; undo 0/0; tli 1; xid 0/1021; oid 16508; multi 1; offset 0; shutdown",,0,,"xlog.c",8323,
> 2016-06-13 20:14:40.295223 PDT,,,p127520,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","recovery restart point at 0/1C76D90",,,,,"xlog redo checkpoint: redo 0/1C76D90; undo 0/0; tli 1; xid 0/1046; oid 16508; multi 1; offset 0; online
> REDO PASS 3 @ 0/1C76D90; LSN 0/1C76DE0: prev 0/1C76D60; xid 0: XLOG - checkpoint: redo 0/1C76D90; undo 0/0; tli 1; xid 0/1046; oid 16508; multi 1; offset 0; online",,0,,"xlog.c",8323,
> 2016-06-13 20:14:40.295618 PDT,,,p127520,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","recovery restart point at 0/1C76DE0",,,,,"xlog redo checkpoint: redo 0/1C76DE0; undo 0/0; tli 1; xid 0/1047; oid 16508; multi 1; offset 0; online
> REDO PASS 3 @ 0/1C76DE0; LSN 0/1C76E30: prev 0/1C76D90; xid 0: XLOG - checkpoint: redo 0/1C76DE0; undo 0/0; tli 1; xid 0/1047; oid 16508; multi 1; offset 0; online",,0,,"xlog.c",8323,
> 2016-06-13 20:14:40.306365 PDT,,,p127520,th-1212462816,,,,0,,,seg-10000,,,,,"FATAL","58P01","could not open relation 1663/16508/1247: No such file or directory","Database directory ""base/16508"" does not exist",,,,"xlog redo newpage: rel 1663/16508/1247; blk 0
> REDO PASS 3 @ 0/1C7B7A8; LSN 0/1C83800: prev 0/1C7B360; xid 1052: Heap - newpage: rel 1663/16508/1247; blk 0",,0,,"md.c",1012,"Stack trace:
> 1    0x87f232 postgres errstart + 0x252
> 2    0x7ad57a postgres <symbol not found> + 0x7ad57a
> 3    0x7ad678 postgres mdnblocks + 0x18
> 4    0x7af3b6 postgres smgrnblocks + 0x16
> 5    0x4f97e7 postgres XLogReadBuffer + 0x17
> 6    0x4c1bf7 postgres heap_redo + 0x4e7
> 7    0x4eb550 postgres <symbol not found> + 0x4eb550
> 8    0x4f4b65 postgres StartupXLOG_Pass3 + 0x155
> 9    0x4f6c08 postgres StartupProcessMain + 0x308
> 10   0x55629d postgres AuxiliaryProcessMain + 0x5bd
> 11   0x767706 postgres <symbol not found> + 0x767706
> 12   0x7689ef postgres <symbol not found> + 0x7689ef
> 13   0x76d7fd postgres <symbol not found> + 0x76d7fd
> 14   0x76f34e postgres PostmasterMain + 0xc7e
> 15   0x6c7e9a postgres main + 0x48a
> 16   0x3e0541ed1d libc.so.6 __libc_start_main + 0xfd
> 17   0x4a26a1 postgres <symbol not found> + 0x4a26a1
> "
> 2016-06-13 20:14:40.308171 PDT,,,p127516,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","startup pass 3 process (PID 127520) exited with exit code 1",,,,,,,0,,"postmaster.c",4726,
> 2016-06-13 20:14:40.308203 PDT,,,p127516,th-1212462816,,,,0,,,seg-10000,,,,,"LOG","00000","aborting startup due to startup process failure",,,,,,,0,,"postmaster.c",3912,
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)