You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Brian Baynes (JIRA)" <ji...@apache.org> on 2017/06/08 21:44:18 UTC
[jira] [Updated] (GEODE-1992) When using locator-wait-time and cluster configuration causes error on server start.

     [ https://issues.apache.org/jira/browse/GEODE-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brian Baynes updated GEODE-1992:
--------------------------------
    Component/s:     (was: membership)
                 configuration

> When using locator-wait-time and cluster configuration causes error on server start.
> ------------------------------------------------------------------------------------
>
>                 Key: GEODE-1992
>                 URL: https://issues.apache.org/jira/browse/GEODE-1992
>             Project: Geode
>          Issue Type: Bug
>          Components: configuration
>            Reporter: Jens Deppe
>
> I'm trying to speed up a cluster start up by using the locator-wait-time property and launching all members asynchronously. Unfortunately there seems to be a race when cluster configuration is enabled (which it is by default).
> On the server the logs show the following:
> {noformat}
> [info 2016/10/11 21:40:02.229 UTC test-cluster-server-0 <main> tid=0x1] Starting membership services
> [info 2016/10/11 21:40:02.658 UTC test-cluster-server-0 <main> tid=0x1] JGroups channel created (took 429ms)
> [info 2016/10/11 21:40:02.727 UTC test-cluster-server-0 <main> tid=0x1] GemFire P2P Listener started on  null
> [info 2016/10/11 21:40:02.734 UTC test-cluster-server-0 <Geode Failure Detection Server thread 0> tid=0x20] Started failure detection server thread on /172.17.0.3:51937.
> [info 2016/10/11 21:40:04.374 UTC test-cluster-server-0 <main> tid=0x1] Attempting to join the distributed system through coordinator 172.17.0.2(test-cluster-locator-0:62:locator)<ec>:1024 using address 172.17.0.3(test-cluster-server-0:34):1024
> [info 2016/10/11 21:40:04.749 UTC test-cluster-server-0 <unicast receiver,test-cluster-server-0-22057> tid=0x1c] received new view: View[172.17.0.2(test-cluster-locator-0:62:locator)<ec><v0>:1024|1] members: [172.17.0.2(test-cluster-locator-0:62:locator)<ec><v0>:1024, 172.17.0.4(test-cluster-server-1:35)<v1>:1024{lead}, 172.17.0.3(test-cluster-server-0:34)<v1>:1024, 172.17.0.5(test-cluster-server-2:34)<v1>:1024]
>   old view is: null
> [info 2016/10/11 21:40:04.771 UTC test-cluster-server-0 <main> tid=0x1] Finished joining (took 2006ms).
> [info 2016/10/11 21:40:04.773 UTC test-cluster-server-0 <main> tid=0x1] Starting DistributionManager 172.17.0.3(test-cluster-server-0:34)<v1>:1024.  (took 3025 ms)
> [info 2016/10/11 21:40:04.782 UTC test-cluster-server-0 <main> tid=0x1] Initial (distribution manager) view =  View[172.17.0.2(test-cluster-locator-0:62:locator)<ec><v0>:1024|1] members: [172.17.0.2(test-cluster-locator-0:62:locator)<ec><v0>:1024, 172.17.0.4(test-cluster-server-1:35)<v1>:1024{lead}, 172.17.0.3(test-cluster-server-0:34)<v1>:1024, 172.17.0.5(test-cluster-server-2:34)<v1>:1024]
> [info 2016/10/11 21:40:04.782 UTC test-cluster-server-0 <main> tid=0x1] Admitting member <172.17.0.2(test-cluster-locator-0:62:locator)<ec><v0>:1024>. Now there are 1 non-admin member(s).
> [info 2016/10/11 21:40:04.785 UTC test-cluster-server-0 <main> tid=0x1] Admitting member <172.17.0.4(test-cluster-server-1:35)<v1>:1024>. Now there are 2 non-admin member(s).
> [info 2016/10/11 21:40:04.785 UTC test-cluster-server-0 <main> tid=0x1] Admitting member <172.17.0.3(test-cluster-server-0:34)<v1>:1024>. Now there are 3 non-admin member(s).
> [info 2016/10/11 21:40:04.786 UTC test-cluster-server-0 <main> tid=0x1] Admitting member <172.17.0.5(test-cluster-server-2:34)<v1>:1024>. Now there are 4 non-admin member(s).
> [info 2016/10/11 21:40:04.980 UTC test-cluster-server-0 <Pooled High Priority Message Processor 2> tid=0x2a] Member 172.17.0.5(test-cluster-server-2:34)<v1>:1024 is not equivalent or in the same redundancy zone.
> [info 2016/10/11 21:40:04.980 UTC test-cluster-server-0 <Pooled High Priority Message Processor 1> tid=0x2b] Member 172.17.0.4(test-cluster-server-1:35)<v1>:1024 is not equivalent or in the same redundancy zone.
> [info 2016/10/11 21:40:04.998 UTC test-cluster-server-0 <P2P message reader for 172.17.0.4(test-cluster-server-1:35)<v1>:1024 shared unordered uid=1 port=38682> tid=0x25] Member 172.17.0.4(test-cluster-server-1:35)<v1>:1024 is not equivalent or in the same redundancy zone.
> [info 2016/10/11 21:40:05.020 UTC test-cluster-server-0 <P2P message reader for 172.17.0.5(test-cluster-server-2:34)<v1>:1024 shared unordered uid=1 port=59864> tid=0x27] Member 172.17.0.5(test-cluster-server-2:34)<v1>:1024 is not equivalent or in the same redundancy zone.
> [info 2016/10/11 21:40:05.070 UTC test-cluster-server-0 <P2P message reader for 172.17.0.2(test-cluster-locator-0:62:locator)<ec><v0>:1024 shared unordered uid=2 port=36482> tid=0x2e] Member 172.17.0.2(test-cluster-locator-0:62:locator)<ec><v0>:1024 is not equivalent or in the same redundancy zone.
> [error 2016/10/11 21:40:05.545 UTC test-cluster-server-0 <main> tid=0x1] org.apache.geode.GemFireConfigException: cluster configuration service not available
> [info 2016/10/11 21:40:05.572 UTC test-cluster-server-0 <Distributed system shutdown hook> tid=0xf] VM is exiting - shutting down distributed system
> {noformat}
> The locator log shows the following:
> {noformat}
> [info 2016/10/11 21:40:06.035 UTC test-cluster-locator-0 <Geode Membership View Creator> tid=0x27] Peer locator received new membership view: View[172.17.0.2(test-cluster-
> locator-0:62:locator)<ec><v0>:1024|2] members: [172.17.0.2(test-cluster-locator-0:62:locator)<ec><v0>:1024]  shutdown: [172.17.0.3(test-cluster-server-0:34)<v1>:1024, 172.
> 17.0.5(test-cluster-server-2:34)<v1>:1024, 172.17.0.4(test-cluster-server-1:35)<v1>:1024]
> [info 2016/10/11 21:40:06.036 UTC test-cluster-locator-0 <Geode Membership View Creator> tid=0x27] no recipients for new view aside from myself
> [info 2016/10/11 21:40:07.130 UTC test-cluster-locator-0 <main> tid=0x1] SharedConfigStatusRequestHandler installed
> [info 2016/10/11 21:40:07.131 UTC test-cluster-locator-0 <main> tid=0x1] Locator started on  172.17.0.2[10334]
> [info 2016/10/11 21:40:07.131 UTC test-cluster-locator-0 <main> tid=0x1] Starting server location for Distribution Locator on test-cluster-locator-0[10334]
> [info 2016/10/11 21:40:07.200 UTC test-cluster-locator-0 <Pooled Message Processor 1> tid=0x48] Created disk store cluster_config with unique id 7c8ab153c6ce48de-a9006bf9a
> 93a1069
> [info 2016/10/11 21:40:07.220 UTC test-cluster-locator-0 <Pooled Message Processor 1> tid=0x48] recovery region initialization took 3 ms
> [info 2016/10/11 21:40:07.249 UTC test-cluster-locator-0 <Pooled Message Processor 1> tid=0x48] Created oplog#1 drf for disk store cluster_config.
> [info 2016/10/11 21:40:07.252 UTC test-cluster-locator-0 <Pooled Message Processor 1> tid=0x48] Created oplog#1 crf for disk store cluster_config.
> [info 2016/10/11 21:40:07.257 UTC test-cluster-locator-0 <Pooled Message Processor 1> tid=0x48] Initializing region _ConfigurationRegion
> [info 2016/10/11 21:40:07.259 UTC test-cluster-locator-0 <Pooled Message Processor 1> tid=0x48] Region /_ConfigurationRegion was created on this member with the persistent
>  id /172.17.0.2:/tmp/test-cluster-locator-0/ConfigDiskDir_test-cluster-locator-0 created at timestamp 1476222007256 version 0 diskStoreId 7c8ab153c6ce48de-a9006bf9a93a1069
>  name test-cluster-locator-0.
> [info 2016/10/11 21:40:07.259 UTC test-cluster-locator-0 <Pooled Message Processor 1> tid=0x48] Initialization of region _ConfigurationRegion completed
> [info 2016/10/11 21:40:07.441 UTC test-cluster-locator-0 <Pooled Message Processor 1> tid=0x48] Completed writing the shared configuration to 'cluster_config' directory
> [info 2016/10/11 21:40:07.442 UTC test-cluster-locator-0 <Pooled Message Processor 1> tid=0x48] Getting Jar files from other locators
> [info 2016/10/11 21:40:07.442 UTC test-cluster-locator-0 <Pooled Message Processor 1> tid=0x48] No other locators present
> [info 2016/10/11 21:40:07.445 UTC test-cluster-locator-0 <Pooled Message Processor 1> tid=0x48] ConfigRequestHandler installed
> [info 2016/10/11 21:40:07.445 UTC test-cluster-locator-0 <Pooled Message Processor 1> tid=0x48] Cluster configuration service start up completed successfully and is now running ....
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)