You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@geode.apache.org by "Thacker, Dharam" <dh...@jpmorgan.com> on 2017/06/05 06:39:06 UTC

How to deal with cluster configuration service failure

Hi Team,

Could someone help to understand how to deal with below scenario where cluster configuration service fails to start in another locator? Which supportive action should we take to rectify this?

Note:
member001.IP.MAKSED - IP address of member001
member002.IP.MASKED - IP address of member002

Locator logs on member002:

[info 2017/06/05 02:07:11.941 EDT RavenLocator2 <Pooled Message Processor 1> tid=0x3d] Initializing region _ConfigurationRegion

[warning 2017/06/05 02:07:11.951 EDT Locator2 <Pooled Message Processor 1> tid=0x3d] Initialization failed for Region /_ConfigurationRegion
org.apache.geode.cache.persistence.ConflictingPersistentDataException: Region /_ConfigurationRegion refusing to initialize from member member001(Locator1:5160:locator)<ec><v0>:1024 with persistent data /169.87.179.46:/local/apps/shared/geode/members/Locator1/work/ConfigDiskDir_Locator1 created at timestamp 1496241336712 version 0 diskStoreId 31efa18230134865-b4fd0fcbde63ade6 name Locator1 which was offline when the local data from /member002.IP.MASKED:/local/apps/shared/geode/members/Locator2/work/ConfigDiskDir_Locator2 created at timestamp 1496241344046 version 0 diskStoreId df94511d0f3d4295-91ec9286a18aaa75 name Locator2 was last online
        at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.checkMyStateOnMembers(PersistenceAdvisorImpl.java:751)
        at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:812)
        at org.apache.geode.internal.cache.persistence.CreatePersistentRegionProcessor.getInitialImageAdvice(CreatePersistentRegionProcessor.java:52)
        at org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1267)
        at org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1101)
        at org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3308)
        at org.apache.geode.distributed.internal.ClusterConfigurationService.getConfigurationRegion(ClusterConfigurationService.java:709)
        at org.apache.geode.distributed.internal.ClusterConfigurationService.initSharedConfiguration(ClusterConfigurationService.java:426)
        at org.apache.geode.distributed.internal.InternalLocator$SharedConfigurationRunnable.run(InternalLocator.java:649)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at org.apache.geode.distributed.internal.DistributionManager.runUntilShutdown(DistributionManager.java:621)
        at org.apache.geode.distributed.internal.DistributionManager$4$1.run(DistributionManager.java:878)
        at java.lang.Thread.run(Thread.java:745)

[error 2017/06/05 02:07:11.959 EDT Locator2 <Pooled Message Processor 1> tid=0x3d] Error occurred while initializing cluster configuration
java.lang.RuntimeException: Error occurred while initializing cluster configuration
        at org.apache.geode.distributed.internal.ClusterConfigurationService.getConfigurationRegion(ClusterConfigurationService.java:722)
        at org.apache.geode.distributed.internal.ClusterConfigurationService.initSharedConfiguration(ClusterConfigurationService.java:426)
        at org.apache.geode.distributed.internal.InternalLocator$SharedConfigurationRunnable.run(InternalLocator.java:649)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at org.apache.geode.distributed.internal.DistributionManager.runUntilShutdown(DistributionManager.java:621)
        at org.apache.geode.distributed.internal.DistributionManager$4$1.run(DistributionManager.java:878)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.geode.cache.persistence.ConflictingPersistentDataException: Region /_ConfigurationRegion refusing to initialize from member member001(Locator1:5160:locator)<ec><v0>:1024 with persistent data /member001.IP.MASKED:/local/apps/shared/geode/members/Locator1/work/ConfigDiskDir_RavenLocator1 created at timestamp 1496241336712 version 0 diskStoreId 31efa18230134865-b4fd0fcbde63ade6 name RavenLocator1 which was offline when the local data from /member002.IP.MASKED:/local/apps/shared/geode/members/Locator2/work/ConfigDiskDir_Locator2 created at timestamp 1496241344046 version 0 diskStoreId df94511d0f3d4295-91ec9286a18aaa75 name Locator2 was last online
        at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.checkMyStateOnMembers(PersistenceAdvisorImpl.java:751)
        at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:812)
        at org.apache.geode.internal.cache.persistence.CreatePersistentRegionProcessor.getInitialImageAdvice(CreatePersistentRegionProcessor.java:52)
        at org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1267)
        at org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1101)
        at org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3308)
        at org.apache.geode.distributed.internal.ClusterConfigurationService.getConfigurationRegion(ClusterConfigurationService.java:709)
        ... 7 more

Thanks & Regards,
Dharam

This message is confidential and subject to terms at: http://www.jpmorgan.com/emaildisclaimer including on confidentiality, legal privilege, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.

RE: How to deal with cluster configuration service failure

Posted by "Thacker, Dharam" <dh...@jpmorgan.com>.

Hi,

It looks bit different to me. I can provide some more information.

Host reboot information:

user@member002:/ $ last reboot | head -1
reboot   system boot  2.6.32-642.11.1. Sun Jun  4 22:44 - 12:20  (13:36)

user@member001:/ $ last reboot | head -1
reboot   system boot  2.6.32-573.35.2. Mon Mar  6 16:56 - 12:21 (90+18:24)

Above information proves that member001 was never rebooted which I can see from timestamp as well.

If I follow error logs,

Exception: ConflictingPersistentDataException

Conditions:
On member001 : ConfigDiskDir_Locator1 created at timestamp 1496241336712 version 0 diskStoreId 31efa18230134865-b4fd0fcbde63ade6
On member002 : ConfigDiskDir_Locator2 created at timestamp 1496241344046 version 0 diskStoreId df94511d0f3d4295-91ec9286a18aaa75
Later on,
1. member001 never went offline actually but member002 was rebooted [As explained above]
2. Locator and server already running in member001
3. Locator and server were requested to start on member002
4. member002 refused as explained in "Conditions"
Thanks & Regards,
Dharam

From: Jinmei Liao [mailto:jiliao@pivotal.io]
Sent: Monday, June 05, 2017 8:38 PM
To: user@geode.apache.org
Subject: Re: How to deal with cluster configuration service failure

Is this related to https://issues.apache.org/jira/browse/GEODE-3003?

On Sun, Jun 4, 2017 at 11:39 PM, Thacker, Dharam <dh...@jpmorgan.com>> wrote:
Hi Team,

Could someone help to understand how to deal with below scenario where cluster configuration service fails to start in another locator? Which supportive action should we take to rectify this?

Note:
member001.IP.MAKSED – IP address of member001
member002.IP.MASKED – IP address of member002

Locator logs on member002:

[info 2017/06/05 02:07:11.941 EDT Locator2 <Pooled Message Processor 1> tid=0x3d] Initializing region _ConfigurationRegion

[warning 2017/06/05 02:07:11.951 EDT Locator2 <Pooled Message Processor 1> tid=0x3d] Initialization failed for Region /_ConfigurationRegion
org.apache.geode.cache.persistence.ConflictingPersistentDataException: Region /_ConfigurationRegion refusing to initialize from member member001(Locator1:5160:locator)<ec><v0>:1024 with persistent data /member001.IP.MASKED:/local/apps/shared/geode/members/Locator1/work/ConfigDiskDir_Locator1 created at timestamp 1496241336712 version 0 diskStoreId 31efa18230134865-b4fd0fcbde63ade6 name Locator1 which was offline when the local data from /member002.IP.MASKED:/local/apps/shared/geode/members/Locator2/work/ConfigDiskDir_Locator2 created at timestamp 1496241344046 version 0 diskStoreId df94511d0f3d4295-91ec9286a18aaa75 name Locator2 was last online
        at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.checkMyStateOnMembers(PersistenceAdvisorImpl.java:751)
        at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:812)
        at org.apache.geode.internal.cache.persistence.CreatePersistentRegionProcessor.getInitialImageAdvice(CreatePersistentRegionProcessor.java:52)
        at org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1267)
        at org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1101)
        at org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3308)
        at org.apache.geode.distributed.internal.ClusterConfigurationService.getConfigurationRegion(ClusterConfigurationService.java:709)
        at org.apache.geode.distributed.internal.ClusterConfigurationService.initSharedConfiguration(ClusterConfigurationService.java:426)
        at org.apache.geode.distributed.internal.InternalLocator$SharedConfigurationRunnable.run(InternalLocator.java:649)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at org.apache.geode.distributed.internal.DistributionManager.runUntilShutdown(DistributionManager.java:621)
        at org.apache.geode.distributed.internal.DistributionManager$4$1.run(DistributionManager.java:878)
        at java.lang.Thread.run(Thread.java:745)

[error 2017/06/05 02:07:11.959 EDT Locator2 <Pooled Message Processor 1> tid=0x3d] Error occurred while initializing cluster configuration
java.lang.RuntimeException: Error occurred while initializing cluster configuration
        at org.apache.geode.distributed.internal.ClusterConfigurationService.getConfigurationRegion(ClusterConfigurationService.java:722)
        at org.apache.geode.distributed.internal.ClusterConfigurationService.initSharedConfiguration(ClusterConfigurationService.java:426)
        at org.apache.geode.distributed.internal.InternalLocator$SharedConfigurationRunnable.run(InternalLocator.java:649)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at org.apache.geode.distributed.internal.DistributionManager.runUntilShutdown(DistributionManager.java:621)
        at org.apache.geode.distributed.internal.DistributionManager$4$1.run(DistributionManager.java:878)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.geode.cache.persistence.ConflictingPersistentDataException: Region /_ConfigurationRegion refusing to initialize from member member001(Locator1:5160:locator)<ec><v0>:1024 with persistent data /member001.IP.MASKED:/local/apps/shared/geode/members/Locator1/work/ConfigDiskDir_RavenLocator1 created at timestamp 1496241336712 version 0 diskStoreId 31efa18230134865-b4fd0fcbde63ade6 name RavenLocator1 which was offline when the local data from /member002.IP.MASKED:/local/apps/shared/geode/members/Locator2/work/ConfigDiskDir_Locator2 created at timestamp 1496241344046 version 0 diskStoreId df94511d0f3d4295-91ec9286a18aaa75 name Locator2 was last online
        at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.checkMyStateOnMembers(PersistenceAdvisorImpl.java:751)
        at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:812)
        at org.apache.geode.internal.cache.persistence.CreatePersistentRegionProcessor.getInitialImageAdvice(CreatePersistentRegionProcessor.java:52)
        at org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1267)
        at org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1101)
        at org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3308)
        at org.apache.geode.distributed.internal.ClusterConfigurationService.getConfigurationRegion(ClusterConfigurationService.java:709)
        ... 7 more

Thanks & Regards,
Dharam

This message is confidential and subject to terms at: http://www.jpmorgan.com/emaildisclaimer<http://www.jpmorgan.com/emaildisclaimer> including on confidentiality, legal privilege, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.



--
Cheers

Jinmei

This message is confidential and subject to terms at: http://www.jpmorgan.com/emaildisclaimer including on confidentiality, legal privilege, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.

RE: How to deal with cluster configuration service failure

Posted by "Thacker, Dharam" <dh...@jpmorgan.com>.

Thanks everyone for suggestions! I will note it down and follow strict start/stop sequence as well.

My startup script already has information about all locators. I can see both below property active in my locator.properties file in both hosts too!
locators=member001[10334],member002[10334]


Regarding Darrel’s point, let me present 2 scenarios…

“One way this could happen is if you have just one locator running and it writes the cluster config to its disk-store. You then shut that locator down and start up a different one. It would have no knowledge of the other locator that you shut down so it would create a brand new cluster config in its disk-store. If at some point these two locators finally see each other the second one to start will throw a ConflictingPersistentDataException”

Let’s say that’s actually the case as Darrel explained where I have started only 1 locator but with property --locators=member001[10334],member002[10334].


1.       Locator1 is running in member001 with --locators=member001[10334],member002[10334].  -> It creates its own cluster config to its disk store -> Then I am shutting down the locator1 on member001

2.       Start Locator2 on member002 with --locators=member001[10334],member002[10334] -> It would create its own fresh cluster config in its disk store

3.       Now I again start Locator1 on member001 with --locators=member001[10334],member002[10334]

Is that really a problem even if 1 locator knows other locator via [--locators] property? Of course at the same time other locator is not running though. I believe then it might be the same case with me. Could you confirm once again?

But if above is true, then let me twik this scenario and present in a different way,


1.       Locator1 is running in member001 with --locators=member001[10334],member002[10334].  -> It creates its own cluster config to its disk store -> Then after sometime locator1 crashes/machine goes down [In previous case I was manually shutting that down]

2.       Locator2 starts on member002 with --locators=member001[10334],member002[10334] using automated jobs scheduled using say autosys

3.       After sometime, member001 machine suddenly comes up and business process validator determines that locator1 should be running and it attempts to start locator1 in member001

Ahh then it would break the system once again.

Any points there?

Thanks & Regards,
Dharam


From: Kirk Lund [mailto:klund@apache.org]
Sent: Monday, June 05, 2017 10:39 PM
To: user@geode.apache.org
Subject: Re: How to deal with cluster configuration service failure

Two locators in the same cluster use the DistributedLockService to determine which one is the primary for cluster config. If two locators don't know about each other, then they are not part of the same cluster, and a server cannot join two clusters.

On Mon, Jun 5, 2017 at 9:48 AM, Mark Secrist <ms...@pivotal.io>> wrote:
I also wonder if it could be that way if the two locators are started without knowledge of each other (via the locators) property.

On Mon, Jun 5, 2017 at 10:45 AM, Darrel Schneider <ds...@pivotal.io>> wrote:
A ConflictingPersistentDataException indicates that two copies of a disk-store where written independently of each other. When using cluster configuration the locator uses a disk-store to write the cluster configuration to disk. It looks like that it the disk-store that is throwing ConflictingPersistentDataException.
One way this could happen is if you have just one locator running and it writes the cluster config to its disk-store. You then shut that locator down and start up a different one. It would have no knowledge of the other locator that you shut down so it would create a brand new cluster config in its disk-store. If at some point these two locators finally see each other the second one to start will throw a ConflictingPersistentDataException. In this case you need to pick which one of these disk-stores you want to be the winner and remove the other disk store. To pick the best winner I think each locator also writes some cache.xml files that will show you in plain text what is in the binary disk-store files. This could also help you in determining what configuration you will lose when you remove one of these disk-stores. You can get that missing config back by doing the same gfsh commands (for example create region). Another option would be to use the gfsh import/export commands. Before deleting either disk-store start them up one at a time and export the cluster config. Then you can start fresh by importing the config.
You might hit a problem in which one of these disk-stores now knows about the other so when you try to start it by itself it fails saying it is waiting for the other to start up. Then when you do that you get the ConflictingPersistentDataException. In that case you would not be able to start them up one at a time to do the export so in that case you need to find the cache.xml files. Someone who knows more about cluster config might be able to help you more.

You should be able to avoid this in the future by making sure you start both locators before doing your first gfsh create command. That way both disk-stores will know about each other and will be kept in sync.

On Mon, Jun 5, 2017 at 8:07 AM, Jinmei Liao <ji...@pivotal.io>> wrote:
Is this related to https://issues.apache.org/jira/browse/GEODE-3003?

On Sun, Jun 4, 2017 at 11:39 PM, Thacker, Dharam <dh...@jpmorgan.com>> wrote:
Hi Team,

Could someone help to understand how to deal with below scenario where cluster configuration service fails to start in another locator? Which supportive action should we take to rectify this?

Note:
member001.IP.MAKSED – IP address of member001
member002.IP.MASKED – IP address of member002

Locator logs on member002:

[info 2017/06/05 02:07:11.941 EDT RavenLocator2 <Pooled Message Processor 1> tid=0x3d] Initializing region _ConfigurationRegion

[warning 2017/06/05 02:07:11.951 EDT Locator2 <Pooled Message Processor 1> tid=0x3d] Initialization failed for Region /_ConfigurationRegion
org.apache.geode.cache.persistence.ConflictingPersistentDataException: Region /_ConfigurationRegion refusing to initialize from member member001(Locator1:5160:locator)<ec><v0>:1024 with persistent data /169.87.179.46:/local/apps/shared/geode/members/Locator1/work/ConfigDiskDir_Locator1 created at timestamp 1496241336712 version 0 diskStoreId 31efa18230134865-b4fd0fcbde63ade6 name Locator1 which was offline when the local data from /member002.IP.MASKED:/local/apps/shared/geode/members/Locator2/work/ConfigDiskDir_Locator2 created at timestamp 1496241344046 version 0 diskStoreId df94511d0f3d4295-91ec9286a18aaa75 name Locator2 was last online
        at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.checkMyStateOnMembers(PersistenceAdvisorImpl.java:751)
        at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:812)
        at org.apache.geode.internal.cache.persistence.CreatePersistentRegionProcessor.getInitialImageAdvice(CreatePersistentRegionProcessor.java:52)
        at org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1267)
        at org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1101)
        at org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3308)
        at org.apache.geode.distributed.internal.ClusterConfigurationService.getConfigurationRegion(ClusterConfigurationService.java:709)
        at org.apache.geode.distributed.internal.ClusterConfigurationService.initSharedConfiguration(ClusterConfigurationService.java:426)
        at org.apache.geode.distributed.internal.InternalLocator$SharedConfigurationRunnable.run(InternalLocator.java:649)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at org.apache.geode.distributed.internal.DistributionManager.ru<http://nternal.DistributionManager.ru>nUntilShutdown(DistributionManager.java:621)
        at org.apache.geode.distributed.internal.DistributionManager$4$1.run(DistributionManager.java:878)
        at java.lang.Thread.run(Thread.java:745)

[error 2017/06/05 02:07:11.959 EDT Locator2 <Pooled Message Processor 1> tid=0x3d] Error occurred while initializing cluster configuration
java.lang.RuntimeException: Error occurred while initializing cluster configuration
        at org.apache.geode.distributed.internal.ClusterConfigurationService.getConfigurationRegion(ClusterConfigurationService.java:722)
        at org.apache.geode.distributed.internal.ClusterConfigurationService.initSharedConfiguration(ClusterConfigurationService.java:426)
        at org.apache.geode.distributed.internal.InternalLocator$SharedConfigurationRunnable.run(InternalLocator.java:649)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at org.apache.geode.distributed.internal.DistributionManager.ru<http://nternal.DistributionManager.ru>nUntilShutdown(DistributionManager.java:621)
        at org.apache.geode.distributed.internal.DistributionManager$4$1.run(DistributionManager.java:878)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.geode.cache.persistence.ConflictingPersistentDataException: Region /_ConfigurationRegion refusing to initialize from member member001(Locator1:5160:locator)<ec><v0>:1024 with persistent data /member001.IP.MASKED:/local/apps/shared/geode/members/Locator1/work/ConfigDiskDir_RavenLocator1 created at timestamp 1496241336712 version 0 diskStoreId 31efa18230134865-b4fd0fcbde63ade6 name RavenLocator1 which was offline when the local data from /member002.IP.MASKED:/local/apps/shared/geode/members/Locator2/work/ConfigDiskDir_Locator2 created at timestamp 1496241344046 version 0 diskStoreId df94511d0f3d4295-91ec9286a18aaa75 name Locator2 was last online
        at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.checkMyStateOnMembers(PersistenceAdvisorImpl.java:751)
        at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:812)
        at org.apache.geode.internal.cache.persistence.CreatePersistentRegionProcessor.getInitialImageAdvice(CreatePersistentRegionProcessor.java:52)
        at org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1267)
        at org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1101)
        at org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3308)
        at org.apache.geode.distributed.internal.ClusterConfigurationService.getConfigurationRegion(ClusterConfigurationService.java:709)
        ... 7 more

Thanks & Regards,
Dharam

This message is confidential and subject to terms at: http://www.jpmorgan.com/emaildisclaimer<http://www.jpmorgan.com/emaildisclaimer> including on confidentiality, legal privilege, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.



--
Cheers

Jinmei




--

Mark Secrist | Sr Manager, Global Education Delivery

msecrist@pivotal.io<ma...@pivotal.io>

970.214.4567 Mobile

 [http://d1fto35gcfffzn.cloudfront.net/images/header/logo-pivotal-220.png]  pivotal.io<http://www.pivotal.io/>

Follow Us: Twitter<http://www.twitter.com/pivotal> | LinkedIn<http://www.linkedin.com/company/pivotalsoftware> | Facebook<http://www.facebook.com/pivotalsoftware> | YouTube<http://www.youtube.com/gopivotal> | Google+<https://plus.google.com/105320112436428794490>


This message is confidential and subject to terms at: http://www.jpmorgan.com/emaildisclaimer including on confidentiality, legal privilege, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.

RE: How to deal with cluster configuration service failure

Posted by "Thacker, Dharam" <dh...@jpmorgan.com>.

I could reproduce the same as explained by Darrel which answers my first scenario though.

Steps:

Console1:
start locator --name=locator1 --port=10334 --locators=localhost[10334],localhost[10335] --enable-cluster-configuration=true --log-level=config
stop locator --name=locator1

Console2:
start locator --name=locator2 --port=10335 --locators=localhost[10334],localhost[10335] --enable-cluster-configuration=true --log-level=config
stop locator --name=locator2

Console1:
start locator --name=locator1 --port=10334 --locators=localhost[10334],localhost[10335] --enable-cluster-configuration=true --log-level=config

Console2:
start locator --name=locator2 --port=10335 --locators=localhost[10334],localhost[10335] --enable-cluster-configuration=true --log-level=config

And now it would say "Cluster configuration service failed to start, please check the log file for errors" with same error in locator log file as in email thread

But as explained in my second scenario, what if such scenario happens due to machine crash events.
For future releases, do you think of any better way by introducing more communication between locators to avoid such scenario?

As an example, let these locators communicate with each other and auto determine who should delete older/corrupted/notvalid cluster configuration from its own “work” directory and auto import fresh cluster config from another locator.

Thanks & Regards,
Dharam

From: Thacker, Dharam
Sent: Tuesday, June 06, 2017 9:53 AM
To: 'user@geode.apache.org'
Subject: RE: How to deal with cluster configuration service failure

Thanks everyone for suggestions! I will note it down and follow strict start/stop sequence as well.

My startup script already has information about all locators. I can see both below property active in my locator.properties file in both hosts too!
locators=member001[10334],member002[10334]


Regarding Darrel’s point, let me present 2 scenarios…

“One way this could happen is if you have just one locator running and it writes the cluster config to its disk-store. You then shut that locator down and start up a different one. It would have no knowledge of the other locator that you shut down so it would create a brand new cluster config in its disk-store. If at some point these two locators finally see each other the second one to start will throw a ConflictingPersistentDataException”

Let’s say that’s actually the case as Darrel explained where I have started only 1 locator but with property --locators=member001[10334],member002[10334].


1.       Locator1 is running in member001 with --locators=member001[10334],member002[10334].  -> It creates its own cluster config to its disk store -> Then I am shutting down the locator1 on member001

2.       Start Locator2 on member002 with --locators=member001[10334],member002[10334] -> It would create its own fresh cluster config in its disk store

3.       Now I again start Locator1 on member001 with --locators=member001[10334],member002[10334]

Is that really a problem even if 1 locator knows other locator via [--locators] property? Of course at the same time other locator is not running though. I believe then it might be the same case with me. Could you confirm once again?

But if above is true, then let me twik this scenario and present in a different way,


1.       Locator1 is running in member001 with --locators=member001[10334],member002[10334].  -> It creates its own cluster config to its disk store -> Then after sometime locator1 crashes/machine goes down [In previous case I was manually shutting that down]

2.       Locator2 starts on member002 with --locators=member001[10334],member002[10334] using automated jobs scheduled using say autosys

3.       After sometime, member001 machine suddenly comes up and business process validator determines that locator1 should be running and it attempts to start locator1 in member001

Ahh then it would break the system once again.

Any points there?

Thanks & Regards,
Dharam


From: Kirk Lund [mailto:klund@apache.org]
Sent: Monday, June 05, 2017 10:39 PM
To: user@geode.apache.org<ma...@geode.apache.org>
Subject: Re: How to deal with cluster configuration service failure

Two locators in the same cluster use the DistributedLockService to determine which one is the primary for cluster config. If two locators don't know about each other, then they are not part of the same cluster, and a server cannot join two clusters.

On Mon, Jun 5, 2017 at 9:48 AM, Mark Secrist <ms...@pivotal.io>> wrote:
I also wonder if it could be that way if the two locators are started without knowledge of each other (via the locators) property.

On Mon, Jun 5, 2017 at 10:45 AM, Darrel Schneider <ds...@pivotal.io>> wrote:
A ConflictingPersistentDataException indicates that two copies of a disk-store where written independently of each other. When using cluster configuration the locator uses a disk-store to write the cluster configuration to disk. It looks like that it the disk-store that is throwing ConflictingPersistentDataException.
One way this could happen is if you have just one locator running and it writes the cluster config to its disk-store. You then shut that locator down and start up a different one. It would have no knowledge of the other locator that you shut down so it would create a brand new cluster config in its disk-store. If at some point these two locators finally see each other the second one to start will throw a ConflictingPersistentDataException. In this case you need to pick which one of these disk-stores you want to be the winner and remove the other disk store. To pick the best winner I think each locator also writes some cache.xml files that will show you in plain text what is in the binary disk-store files. This could also help you in determining what configuration you will lose when you remove one of these disk-stores. You can get that missing config back by doing the same gfsh commands (for example create region). Another option would be to use the gfsh import/export commands. Before deleting either disk-store start them up one at a time and export the cluster config. Then you can start fresh by importing the config.
You might hit a problem in which one of these disk-stores now knows about the other so when you try to start it by itself it fails saying it is waiting for the other to start up. Then when you do that you get the ConflictingPersistentDataException. In that case you would not be able to start them up one at a time to do the export so in that case you need to find the cache.xml files. Someone who knows more about cluster config might be able to help you more.

You should be able to avoid this in the future by making sure you start both locators before doing your first gfsh create command. That way both disk-stores will know about each other and will be kept in sync.

On Mon, Jun 5, 2017 at 8:07 AM, Jinmei Liao <ji...@pivotal.io>> wrote:
Is this related to https://issues.apache.org/jira/browse/GEODE-3003?

On Sun, Jun 4, 2017 at 11:39 PM, Thacker, Dharam <dh...@jpmorgan.com>> wrote:
Hi Team,

Could someone help to understand how to deal with below scenario where cluster configuration service fails to start in another locator? Which supportive action should we take to rectify this?

Note:
member001.IP.MAKSED – IP address of member001
member002.IP.MASKED – IP address of member002

Locator logs on member002:

[info 2017/06/05 02:07:11.941 EDT RavenLocator2 <Pooled Message Processor 1> tid=0x3d] Initializing region _ConfigurationRegion

[warning 2017/06/05 02:07:11.951 EDT Locator2 <Pooled Message Processor 1> tid=0x3d] Initialization failed for Region /_ConfigurationRegion
org.apache.geode.cache.persistence.ConflictingPersistentDataException: Region /_ConfigurationRegion refusing to initialize from member member001(Locator1:5160:locator)<ec><v0>:1024 with persistent data /169.87.179.46:/local/apps/shared/geode/members/Locator1/work/ConfigDiskDir_Locator1 created at timestamp 1496241336712 version 0 diskStoreId 31efa18230134865-b4fd0fcbde63ade6 name Locator1 which was offline when the local data from /member002.IP.MASKED:/local/apps/shared/geode/members/Locator2/work/ConfigDiskDir_Locator2 created at timestamp 1496241344046 version 0 diskStoreId df94511d0f3d4295-91ec9286a18aaa75 name Locator2 was last online
        at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.checkMyStateOnMembers(PersistenceAdvisorImpl.java:751)
        at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:812)
        at org.apache.geode.internal.cache.persistence.CreatePersistentRegionProcessor.getInitialImageAdvice(CreatePersistentRegionProcessor.java:52)
        at org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1267)
        at org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1101)
        at org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3308)
        at org.apache.geode.distributed.internal.ClusterConfigurationService.getConfigurationRegion(ClusterConfigurationService.java:709)
        at org.apache.geode.distributed.internal.ClusterConfigurationService.initSharedConfiguration(ClusterConfigurationService.java:426)
        at org.apache.geode.distributed.internal.InternalLocator$SharedConfigurationRunnable.run(InternalLocator.java:649)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at org.apache.geode.distributed.internal.DistributionManager.ru<http://nternal.DistributionManager.ru>nUntilShutdown(DistributionManager.java:621)
        at org.apache.geode.distributed.internal.DistributionManager$4$1.run(DistributionManager.java:878)
        at java.lang.Thread.run(Thread.java:745)

[error 2017/06/05 02:07:11.959 EDT Locator2 <Pooled Message Processor 1> tid=0x3d] Error occurred while initializing cluster configuration
java.lang.RuntimeException: Error occurred while initializing cluster configuration
        at org.apache.geode.distributed.internal.ClusterConfigurationService.getConfigurationRegion(ClusterConfigurationService.java:722)
        at org.apache.geode.distributed.internal.ClusterConfigurationService.initSharedConfiguration(ClusterConfigurationService.java:426)
        at org.apache.geode.distributed.internal.InternalLocator$SharedConfigurationRunnable.run(InternalLocator.java:649)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at org.apache.geode.distributed.internal.DistributionManager.ru<http://nternal.DistributionManager.ru>nUntilShutdown(DistributionManager.java:621)
        at org.apache.geode.distributed.internal.DistributionManager$4$1.run(DistributionManager.java:878)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.geode.cache.persistence.ConflictingPersistentDataException: Region /_ConfigurationRegion refusing to initialize from member member001(Locator1:5160:locator)<ec><v0>:1024 with persistent data /member001.IP.MASKED:/local/apps/shared/geode/members/Locator1/work/ConfigDiskDir_RavenLocator1 created at timestamp 1496241336712 version 0 diskStoreId 31efa18230134865-b4fd0fcbde63ade6 name RavenLocator1 which was offline when the local data from /member002.IP.MASKED:/local/apps/shared/geode/members/Locator2/work/ConfigDiskDir_Locator2 created at timestamp 1496241344046 version 0 diskStoreId df94511d0f3d4295-91ec9286a18aaa75 name Locator2 was last online
        at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.checkMyStateOnMembers(PersistenceAdvisorImpl.java:751)
        at org.apache.geode.internal.cache.persistence.PersistenceAdvisorImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:812)
        at org.apache.geode.internal.cache.persistence.CreatePersistentRegionProcessor.getInitialImageAdvice(CreatePersistentRegionProcessor.java:52)
        at org.apache.geode.internal.cache.DistributedRegion.getInitialImageAndRecovery(DistributedRegion.java:1267)
        at org.apache.geode.internal.cache.DistributedRegion.initialize(DistributedRegion.java:1101)
        at org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3308)
        at org.apache.geode.distributed.internal.ClusterConfigurationService.getConfigurationRegion(ClusterConfigurationService.java:709)
        ... 7 more

Thanks & Regards,
Dharam

This message is confidential and subject to terms at: http://www.jpmorgan.com/emaildisclaimer<http://www.jpmorgan.com/emaildisclaimer> including on confidentiality, legal privilege, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.



--
Cheers

Jinmei




--

Mark Secrist | Sr Manager, Global Education Delivery

msecrist@pivotal.io<ma...@pivotal.io>

970.214.4567 Mobile

 [http://d1fto35gcfffzn.cloudfront.net/images/header/logo-pivotal-220.png]  pivotal.io<http://www.pivotal.io/>

Follow Us: Twitter<http://www.twitter.com/pivotal> | LinkedIn<http://www.linkedin.com/company/pivotalsoftware> | Facebook<http://www.facebook.com/pivotalsoftware> | YouTube<http://www.youtube.com/gopivotal> | Google+<https://plus.google.com/105320112436428794490>


This message is confidential and subject to terms at: http://www.jpmorgan.com/emaildisclaimer including on confidentiality, legal privilege, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.

Re: How to deal with cluster configuration service failure

Posted by Kirk Lund <kl...@apache.org>.

Two locators in the same cluster use the DistributedLockService to
determine which one is the primary for cluster config. If two locators
don't know about each other, then they are not part of the same cluster,
and a server cannot join two clusters.

On Mon, Jun 5, 2017 at 9:48 AM, Mark Secrist <ms...@pivotal.io> wrote:

> I also wonder if it could be that way if the two locators are started
> without knowledge of each other (via the locators) property.
>
> On Mon, Jun 5, 2017 at 10:45 AM, Darrel Schneider <ds...@pivotal.io>
> wrote:
>
>> A ConflictingPersistentDataException indicates that two copies of a
>> disk-store where written independently of each other. When using cluster
>> configuration the locator uses a disk-store to write the cluster
>> configuration to disk. It looks like that it the disk-store that is
>> throwing ConflictingPersistentDataException.
>> One way this could happen is if you have just one locator running and it
>> writes the cluster config to its disk-store. You then shut that locator
>> down and start up a different one. It would have no knowledge of the other
>> locator that you shut down so it would create a brand new cluster config in
>> its disk-store. If at some point these two locators finally see each other
>> the second one to start will throw a ConflictingPersistentDataException.
>> In this case you need to pick which one of these disk-stores you want to be
>> the winner and remove the other disk store. To pick the best winner I think
>> each locator also writes some cache.xml files that will show you in plain
>> text what is in the binary disk-store files. This could also help you in
>> determining what configuration you will lose when you remove one of these
>> disk-stores. You can get that missing config back by doing the same gfsh
>> commands (for example create region). Another option would be to use the
>> gfsh import/export commands. Before deleting either disk-store start them
>> up one at a time and export the cluster config. Then you can start fresh by
>> importing the config.
>> You might hit a problem in which one of these disk-stores now knows about
>> the other so when you try to start it by itself it fails saying it is
>> waiting for the other to start up. Then when you do that you get the
>> ConflictingPersistentDataException. In that case you would not be able
>> to start them up one at a time to do the export so in that case you need to
>> find the cache.xml files. Someone who knows more about cluster config might
>> be able to help you more.
>>
>> You should be able to avoid this in the future by making sure you start
>> both locators before doing your first gfsh create command. That way both
>> disk-stores will know about each other and will be kept in sync.
>>
>> On Mon, Jun 5, 2017 at 8:07 AM, Jinmei Liao <ji...@pivotal.io> wrote:
>>
>>> Is this related to https://issues.apache.org/jira/browse/GEODE-3003?
>>>
>>> On Sun, Jun 4, 2017 at 11:39 PM, Thacker, Dharam <
>>> dharam.thacker@jpmorgan.com> wrote:
>>>
>>>> Hi Team,
>>>>
>>>>
>>>>
>>>> Could someone help to understand how to deal with below scenario where
>>>> cluster configuration service fails to start in another locator? Which
>>>> supportive action should we take to rectify this?
>>>>
>>>>
>>>>
>>>> *Note*:
>>>>
>>>> member001.IP.MAKSED – IP address of member001
>>>>
>>>> member002.IP.MASKED – IP address of member002
>>>>
>>>>
>>>>
>>>> *Locator logs on member002:*
>>>>
>>>>
>>>>
>>>> [info 2017/06/05 02:07:11.941 EDT RavenLocator2 <Pooled Message
>>>> Processor 1> tid=0x3d] Initializing region _ConfigurationRegion
>>>>
>>>>
>>>>
>>>> [warning 2017/06/05 02:07:11.951 EDT Locator2 <Pooled Message Processor
>>>> 1> tid=0x3d] Initialization failed for Region /_ConfigurationRegion
>>>>
>>>> org.apache.geode.cache.persistence.ConflictingPersistentDataException:
>>>> Region /_ConfigurationRegion refusing to initialize from member
>>>> member001(Locator1:5160:locator)<ec><v0>:1024 with persistent data
>>>> /169.87.179.46:/local/apps/shared/geode/members/Locator1/work/ConfigDiskDir_Locator1
>>>> created at timestamp 1496241336712 version 0 diskStoreId
>>>> 31efa18230134865-b4fd0fcbde63ade6 name Locator1 which was offline when
>>>> the local data from /*member002.IP.MASKED*:/local/ap
>>>> ps/shared/geode/members/Locator2/work/ConfigDiskDir_Locator2 created
>>>> at timestamp 1496241344046 version 0 diskStoreId
>>>> df94511d0f3d4295-91ec9286a18aaa75 name Locator2 was last online
>>>>
>>>>         at org.apache.geode.internal.cache.persistence.PersistenceAdvis
>>>> orImpl.checkMyStateOnMembers(PersistenceAdvisorImpl.java:751)
>>>>
>>>>         at org.apache.geode.internal.cache.persistence.PersistenceAdvis
>>>> orImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:812)
>>>>
>>>>         at org.apache.geode.internal.cache.persistence.CreatePersistent
>>>> RegionProcessor.getInitialImageAdvice(CreatePersistentRegion
>>>> Processor.java:52)
>>>>
>>>>         at org.apache.geode.internal.cache.DistributedRegion.getInitial
>>>> ImageAndRecovery(DistributedRegion.java:1267)
>>>>
>>>>         at org.apache.geode.internal.cache.DistributedRegion.initialize
>>>> (DistributedRegion.java:1101)
>>>>
>>>>         at org.apache.geode.internal.cache.GemFireCacheImpl.createVMReg
>>>> ion(GemFireCacheImpl.java:3308)
>>>>
>>>>         at org.apache.geode.distributed.internal.ClusterConfigurationSe
>>>> rvice.getConfigurationRegion(ClusterConfigurationService.java:709)
>>>>
>>>>         at org.apache.geode.distributed.internal.ClusterConfigurationSe
>>>> rvice.initSharedConfiguration(ClusterConfigurationService.java:426)
>>>>
>>>>         at org.apache.geode.distributed.internal.InternalLocator$Shared
>>>> ConfigurationRunnable.run(InternalLocator.java:649)
>>>>
>>>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>>> Executor.java:1142)
>>>>
>>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>>> lExecutor.java:617)
>>>>
>>>>         at org.apache.geode.distributed.internal.DistributionManager.ru
>>>> nUntilShutdown(DistributionManager.java:621)
>>>>
>>>>         at org.apache.geode.distributed.internal.DistributionManager$4$
>>>> 1.run(DistributionManager.java:878)
>>>>
>>>>         at java.lang.Thread.run(Thread.java:745)
>>>>
>>>>
>>>>
>>>> [error 2017/06/05 02:07:11.959 EDT Locator2 <Pooled Message Processor
>>>> 1> tid=0x3d] Error occurred while initializing cluster configuration
>>>>
>>>> java.lang.RuntimeException: Error occurred while initializing cluster
>>>> configuration
>>>>
>>>>         at org.apache.geode.distributed.internal.ClusterConfigurationSe
>>>> rvice.getConfigurationRegion(ClusterConfigurationService.java:722)
>>>>
>>>>         at org.apache.geode.distributed.internal.ClusterConfigurationSe
>>>> rvice.initSharedConfiguration(ClusterConfigurationService.java:426)
>>>>
>>>>         at org.apache.geode.distributed.internal.InternalLocator$Shared
>>>> ConfigurationRunnable.run(InternalLocator.java:649)
>>>>
>>>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>>> Executor.java:1142)
>>>>
>>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>>> lExecutor.java:617)
>>>>
>>>>         at org.apache.geode.distributed.internal.DistributionManager.ru
>>>> nUntilShutdown(DistributionManager.java:621)
>>>>
>>>>         at org.apache.geode.distributed.internal.DistributionManager$4$
>>>> 1.run(DistributionManager.java:878)
>>>>
>>>>         at java.lang.Thread.run(Thread.java:745)
>>>>
>>>> Caused by: org.apache.geode.cache.persistence.ConflictingPersistentDataException:
>>>> Region /_ConfigurationRegion refusing to initialize from member
>>>> member001(Locator1:5160:locator)<ec><v0>:1024 with persistent data /
>>>> *member001.IP.MASKED*:/local/apps/shared/geode/members/Locato
>>>> r1/work/ConfigDiskDir_RavenLocator1 created at timestamp 1496241336712
>>>> version 0 diskStoreId 31efa18230134865-b4fd0fcbde63ade6 name
>>>> RavenLocator1 which was offline when the local data from /
>>>> *member002.IP.MASKED*:/local/apps/shared/geode/members/Locator2/work/ConfigDiskDir_Locator2
>>>> created at timestamp 1496241344046 version 0 diskStoreId
>>>> df94511d0f3d4295-91ec9286a18aaa75 name Locator2 was last online
>>>>
>>>>         at org.apache.geode.internal.cache.persistence.PersistenceAdvis
>>>> orImpl.checkMyStateOnMembers(PersistenceAdvisorImpl.java:751)
>>>>
>>>>         at org.apache.geode.internal.cache.persistence.PersistenceAdvis
>>>> orImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:812)
>>>>
>>>>         at org.apache.geode.internal.cache.persistence.CreatePersistent
>>>> RegionProcessor.getInitialImageAdvice(CreatePersistentRegion
>>>> Processor.java:52)
>>>>
>>>>         at org.apache.geode.internal.cache.DistributedRegion.getInitial
>>>> ImageAndRecovery(DistributedRegion.java:1267)
>>>>
>>>>         at org.apache.geode.internal.cache.DistributedRegion.initialize
>>>> (DistributedRegion.java:1101)
>>>>
>>>>         at org.apache.geode.internal.cache.GemFireCacheImpl.createVMReg
>>>> ion(GemFireCacheImpl.java:3308)
>>>>
>>>>         at org.apache.geode.distributed.internal.ClusterConfigurationSe
>>>> rvice.getConfigurationRegion(ClusterConfigurationService.java:709)
>>>>
>>>>         ... 7 more
>>>>
>>>>
>>>>
>>>> Thanks & Regards,
>>>>
>>>> Dharam
>>>>
>>>> This message is confidential and subject to terms at: http://
>>>> www.jpmorgan.com/emaildisclaimer including on confidentiality, legal
>>>> privilege, viruses and monitoring of electronic messages. If you are not
>>>> the intended recipient, please delete this message and notify the sender
>>>> immediately. Any unauthorized use is strictly prohibited.
>>>>
>>>
>>>
>>>
>>> --
>>> Cheers
>>>
>>> Jinmei
>>>
>>
>>
>
>
> --
>
> *Mark Secrist | Sr Manager, **Global Education Delivery*
>
> msecrist@pivotal.io
>
> 970.214.4567 Mobile
>
>   *pivotal.io <http://www.pivotal.io/>*
>
> Follow Us: Twitter <http://www.twitter.com/pivotal> | LinkedIn
> <http://www.linkedin.com/company/pivotalsoftware> | Facebook
> <http://www.facebook.com/pivotalsoftware> | YouTube
> <http://www.youtube.com/gopivotal> | Google+
> <https://plus.google.com/105320112436428794490>
>

Re: How to deal with cluster configuration service failure

Posted by Mark Secrist <ms...@pivotal.io>.

I also wonder if it could be that way if the two locators are started
without knowledge of each other (via the locators) property.

On Mon, Jun 5, 2017 at 10:45 AM, Darrel Schneider <ds...@pivotal.io>
wrote:

> A ConflictingPersistentDataException indicates that two copies of a
> disk-store where written independently of each other. When using cluster
> configuration the locator uses a disk-store to write the cluster
> configuration to disk. It looks like that it the disk-store that is
> throwing ConflictingPersistentDataException.
> One way this could happen is if you have just one locator running and it
> writes the cluster config to its disk-store. You then shut that locator
> down and start up a different one. It would have no knowledge of the other
> locator that you shut down so it would create a brand new cluster config in
> its disk-store. If at some point these two locators finally see each other
> the second one to start will throw a ConflictingPersistentDataException.
> In this case you need to pick which one of these disk-stores you want to be
> the winner and remove the other disk store. To pick the best winner I think
> each locator also writes some cache.xml files that will show you in plain
> text what is in the binary disk-store files. This could also help you in
> determining what configuration you will lose when you remove one of these
> disk-stores. You can get that missing config back by doing the same gfsh
> commands (for example create region). Another option would be to use the
> gfsh import/export commands. Before deleting either disk-store start them
> up one at a time and export the cluster config. Then you can start fresh by
> importing the config.
> You might hit a problem in which one of these disk-stores now knows about
> the other so when you try to start it by itself it fails saying it is
> waiting for the other to start up. Then when you do that you get the
> ConflictingPersistentDataException. In that case you would not be able to
> start them up one at a time to do the export so in that case you need to
> find the cache.xml files. Someone who knows more about cluster config might
> be able to help you more.
>
> You should be able to avoid this in the future by making sure you start
> both locators before doing your first gfsh create command. That way both
> disk-stores will know about each other and will be kept in sync.
>
> On Mon, Jun 5, 2017 at 8:07 AM, Jinmei Liao <ji...@pivotal.io> wrote:
>
>> Is this related to https://issues.apache.org/jira/browse/GEODE-3003?
>>
>> On Sun, Jun 4, 2017 at 11:39 PM, Thacker, Dharam <
>> dharam.thacker@jpmorgan.com> wrote:
>>
>>> Hi Team,
>>>
>>>
>>>
>>> Could someone help to understand how to deal with below scenario where
>>> cluster configuration service fails to start in another locator? Which
>>> supportive action should we take to rectify this?
>>>
>>>
>>>
>>> *Note*:
>>>
>>> member001.IP.MAKSED – IP address of member001
>>>
>>> member002.IP.MASKED – IP address of member002
>>>
>>>
>>>
>>> *Locator logs on member002:*
>>>
>>>
>>>
>>> [info 2017/06/05 02:07:11.941 EDT RavenLocator2 <Pooled Message
>>> Processor 1> tid=0x3d] Initializing region _ConfigurationRegion
>>>
>>>
>>>
>>> [warning 2017/06/05 02:07:11.951 EDT Locator2 <Pooled Message Processor
>>> 1> tid=0x3d] Initialization failed for Region /_ConfigurationRegion
>>>
>>> org.apache.geode.cache.persistence.ConflictingPersistentDataException:
>>> Region /_ConfigurationRegion refusing to initialize from member
>>> member001(Locator1:5160:locator)<ec><v0>:1024 with persistent data
>>> /169.87.179.46:/local/apps/shared/geode/members/Locator1/work/ConfigDiskDir_Locator1
>>> created at timestamp 1496241336712 version 0 diskStoreId
>>> 31efa18230134865-b4fd0fcbde63ade6 name Locator1 which was offline when
>>> the local data from /*member002.IP.MASKED*:/local/ap
>>> ps/shared/geode/members/Locator2/work/ConfigDiskDir_Locator2 created at
>>> timestamp 1496241344046 version 0 diskStoreId df94511d0f3d4295-91ec9286a18aaa75
>>> name Locator2 was last online
>>>
>>>         at org.apache.geode.internal.cache.persistence.PersistenceAdvis
>>> orImpl.checkMyStateOnMembers(PersistenceAdvisorImpl.java:751)
>>>
>>>         at org.apache.geode.internal.cache.persistence.PersistenceAdvis
>>> orImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:812)
>>>
>>>         at org.apache.geode.internal.cache.persistence.CreatePersistent
>>> RegionProcessor.getInitialImageAdvice(CreatePersistentRegion
>>> Processor.java:52)
>>>
>>>         at org.apache.geode.internal.cache.DistributedRegion.getInitial
>>> ImageAndRecovery(DistributedRegion.java:1267)
>>>
>>>         at org.apache.geode.internal.cache.DistributedRegion.initialize
>>> (DistributedRegion.java:1101)
>>>
>>>         at org.apache.geode.internal.cache.GemFireCacheImpl.createVMReg
>>> ion(GemFireCacheImpl.java:3308)
>>>
>>>         at org.apache.geode.distributed.internal.ClusterConfigurationSe
>>> rvice.getConfigurationRegion(ClusterConfigurationService.java:709)
>>>
>>>         at org.apache.geode.distributed.internal.ClusterConfigurationSe
>>> rvice.initSharedConfiguration(ClusterConfigurationService.java:426)
>>>
>>>         at org.apache.geode.distributed.internal.InternalLocator$Shared
>>> ConfigurationRunnable.run(InternalLocator.java:649)
>>>
>>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>> Executor.java:1142)
>>>
>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>> lExecutor.java:617)
>>>
>>>         at org.apache.geode.distributed.internal.DistributionManager.ru
>>> nUntilShutdown(DistributionManager.java:621)
>>>
>>>         at org.apache.geode.distributed.internal.DistributionManager$4$
>>> 1.run(DistributionManager.java:878)
>>>
>>>         at java.lang.Thread.run(Thread.java:745)
>>>
>>>
>>>
>>> [error 2017/06/05 02:07:11.959 EDT Locator2 <Pooled Message Processor 1>
>>> tid=0x3d] Error occurred while initializing cluster configuration
>>>
>>> java.lang.RuntimeException: Error occurred while initializing cluster
>>> configuration
>>>
>>>         at org.apache.geode.distributed.internal.ClusterConfigurationSe
>>> rvice.getConfigurationRegion(ClusterConfigurationService.java:722)
>>>
>>>         at org.apache.geode.distributed.internal.ClusterConfigurationSe
>>> rvice.initSharedConfiguration(ClusterConfigurationService.java:426)
>>>
>>>         at org.apache.geode.distributed.internal.InternalLocator$Shared
>>> ConfigurationRunnable.run(InternalLocator.java:649)
>>>
>>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>> Executor.java:1142)
>>>
>>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>> lExecutor.java:617)
>>>
>>>         at org.apache.geode.distributed.internal.DistributionManager.ru
>>> nUntilShutdown(DistributionManager.java:621)
>>>
>>>         at org.apache.geode.distributed.internal.DistributionManager$4$
>>> 1.run(DistributionManager.java:878)
>>>
>>>         at java.lang.Thread.run(Thread.java:745)
>>>
>>> Caused by: org.apache.geode.cache.persistence.ConflictingPersistentDataException:
>>> Region /_ConfigurationRegion refusing to initialize from member
>>> member001(Locator1:5160:locator)<ec><v0>:1024 with persistent data /
>>> *member001.IP.MASKED*:/local/apps/shared/geode/members/Locato
>>> r1/work/ConfigDiskDir_RavenLocator1 created at timestamp 1496241336712
>>> version 0 diskStoreId 31efa18230134865-b4fd0fcbde63ade6 name
>>> RavenLocator1 which was offline when the local data from /
>>> *member002.IP.MASKED*:/local/apps/shared/geode/members/Locator2/work/ConfigDiskDir_Locator2
>>> created at timestamp 1496241344046 version 0 diskStoreId
>>> df94511d0f3d4295-91ec9286a18aaa75 name Locator2 was last online
>>>
>>>         at org.apache.geode.internal.cache.persistence.PersistenceAdvis
>>> orImpl.checkMyStateOnMembers(PersistenceAdvisorImpl.java:751)
>>>
>>>         at org.apache.geode.internal.cache.persistence.PersistenceAdvis
>>> orImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:812)
>>>
>>>         at org.apache.geode.internal.cache.persistence.CreatePersistent
>>> RegionProcessor.getInitialImageAdvice(CreatePersistentRegion
>>> Processor.java:52)
>>>
>>>         at org.apache.geode.internal.cache.DistributedRegion.getInitial
>>> ImageAndRecovery(DistributedRegion.java:1267)
>>>
>>>         at org.apache.geode.internal.cache.DistributedRegion.initialize
>>> (DistributedRegion.java:1101)
>>>
>>>         at org.apache.geode.internal.cache.GemFireCacheImpl.createVMReg
>>> ion(GemFireCacheImpl.java:3308)
>>>
>>>         at org.apache.geode.distributed.internal.ClusterConfigurationSe
>>> rvice.getConfigurationRegion(ClusterConfigurationService.java:709)
>>>
>>>         ... 7 more
>>>
>>>
>>>
>>> Thanks & Regards,
>>>
>>> Dharam
>>>
>>> This message is confidential and subject to terms at: http://
>>> www.jpmorgan.com/emaildisclaimer including on confidentiality, legal
>>> privilege, viruses and monitoring of electronic messages. If you are not
>>> the intended recipient, please delete this message and notify the sender
>>> immediately. Any unauthorized use is strictly prohibited.
>>>
>>
>>
>>
>> --
>> Cheers
>>
>> Jinmei
>>
>
>


-- 

*Mark Secrist | Sr Manager, **Global Education Delivery*

msecrist@pivotal.io

970.214.4567 Mobile

  *pivotal.io <http://www.pivotal.io/>*

Follow Us: Twitter <http://www.twitter.com/pivotal> | LinkedIn
<http://www.linkedin.com/company/pivotalsoftware> | Facebook
<http://www.facebook.com/pivotalsoftware> | YouTube
<http://www.youtube.com/gopivotal> | Google+
<https://plus.google.com/105320112436428794490>

Re: How to deal with cluster configuration service failure

Posted by Darrel Schneider <ds...@pivotal.io>.

A ConflictingPersistentDataException indicates that two copies of a
disk-store where written independently of each other. When using cluster
configuration the locator uses a disk-store to write the cluster
configuration to disk. It looks like that it the disk-store that is
throwing ConflictingPersistentDataException.
One way this could happen is if you have just one locator running and it
writes the cluster config to its disk-store. You then shut that locator
down and start up a different one. It would have no knowledge of the other
locator that you shut down so it would create a brand new cluster config in
its disk-store. If at some point these two locators finally see each other
the second one to start will throw a ConflictingPersistentDataException. In
this case you need to pick which one of these disk-stores you want to be
the winner and remove the other disk store. To pick the best winner I think
each locator also writes some cache.xml files that will show you in plain
text what is in the binary disk-store files. This could also help you in
determining what configuration you will lose when you remove one of these
disk-stores. You can get that missing config back by doing the same gfsh
commands (for example create region). Another option would be to use the
gfsh import/export commands. Before deleting either disk-store start them
up one at a time and export the cluster config. Then you can start fresh by
importing the config.
You might hit a problem in which one of these disk-stores now knows about
the other so when you try to start it by itself it fails saying it is
waiting for the other to start up. Then when you do that you get the
ConflictingPersistentDataException. In that case you would not be able to
start them up one at a time to do the export so in that case you need to
find the cache.xml files. Someone who knows more about cluster config might
be able to help you more.

You should be able to avoid this in the future by making sure you start
both locators before doing your first gfsh create command. That way both
disk-stores will know about each other and will be kept in sync.

On Mon, Jun 5, 2017 at 8:07 AM, Jinmei Liao <ji...@pivotal.io> wrote:

> Is this related to https://issues.apache.org/jira/browse/GEODE-3003?
>
> On Sun, Jun 4, 2017 at 11:39 PM, Thacker, Dharam <
> dharam.thacker@jpmorgan.com> wrote:
>
>> Hi Team,
>>
>>
>>
>> Could someone help to understand how to deal with below scenario where
>> cluster configuration service fails to start in another locator? Which
>> supportive action should we take to rectify this?
>>
>>
>>
>> *Note*:
>>
>> member001.IP.MAKSED – IP address of member001
>>
>> member002.IP.MASKED – IP address of member002
>>
>>
>>
>> *Locator logs on member002:*
>>
>>
>>
>> [info 2017/06/05 02:07:11.941 EDT RavenLocator2 <Pooled Message Processor
>> 1> tid=0x3d] Initializing region _ConfigurationRegion
>>
>>
>>
>> [warning 2017/06/05 02:07:11.951 EDT Locator2 <Pooled Message Processor
>> 1> tid=0x3d] Initialization failed for Region /_ConfigurationRegion
>>
>> org.apache.geode.cache.persistence.ConflictingPersistentDataException:
>> Region /_ConfigurationRegion refusing to initialize from member
>> member001(Locator1:5160:locator)<ec><v0>:1024 with persistent data
>> /169.87.179.46:/local/apps/shared/geode/members/Locator1/work/ConfigDiskDir_Locator1
>> created at timestamp 1496241336712 version 0 diskStoreId
>> 31efa18230134865-b4fd0fcbde63ade6 name Locator1 which was offline when
>> the local data from /*member002.IP.MASKED*:/local/ap
>> ps/shared/geode/members/Locator2/work/ConfigDiskDir_Locator2 created at
>> timestamp 1496241344046 version 0 diskStoreId df94511d0f3d4295-91ec9286a18aaa75
>> name Locator2 was last online
>>
>>         at org.apache.geode.internal.cache.persistence.PersistenceAdvis
>> orImpl.checkMyStateOnMembers(PersistenceAdvisorImpl.java:751)
>>
>>         at org.apache.geode.internal.cache.persistence.PersistenceAdvis
>> orImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:812)
>>
>>         at org.apache.geode.internal.cache.persistence.CreatePersistent
>> RegionProcessor.getInitialImageAdvice(CreatePe
>> rsistentRegionProcessor.java:52)
>>
>>         at org.apache.geode.internal.cache.DistributedRegion.getInitial
>> ImageAndRecovery(DistributedRegion.java:1267)
>>
>>         at org.apache.geode.internal.cache.DistributedRegion.initialize
>> (DistributedRegion.java:1101)
>>
>>         at org.apache.geode.internal.cache.GemFireCacheImpl.createVMReg
>> ion(GemFireCacheImpl.java:3308)
>>
>>         at org.apache.geode.distributed.internal.ClusterConfigurationSe
>> rvice.getConfigurationRegion(ClusterConfigurationService.java:709)
>>
>>         at org.apache.geode.distributed.internal.ClusterConfigurationSe
>> rvice.initSharedConfiguration(ClusterConfigurationService.java:426)
>>
>>         at org.apache.geode.distributed.internal.InternalLocator$Shared
>> ConfigurationRunnable.run(InternalLocator.java:649)
>>
>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>>
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>>
>>         at org.apache.geode.distributed.internal.DistributionManager.ru
>> nUntilShutdown(DistributionManager.java:621)
>>
>>         at org.apache.geode.distributed.internal.DistributionManager$4$
>> 1.run(DistributionManager.java:878)
>>
>>         at java.lang.Thread.run(Thread.java:745)
>>
>>
>>
>> [error 2017/06/05 02:07:11.959 EDT Locator2 <Pooled Message Processor 1>
>> tid=0x3d] Error occurred while initializing cluster configuration
>>
>> java.lang.RuntimeException: Error occurred while initializing cluster
>> configuration
>>
>>         at org.apache.geode.distributed.internal.ClusterConfigurationSe
>> rvice.getConfigurationRegion(ClusterConfigurationService.java:722)
>>
>>         at org.apache.geode.distributed.internal.ClusterConfigurationSe
>> rvice.initSharedConfiguration(ClusterConfigurationService.java:426)
>>
>>         at org.apache.geode.distributed.internal.InternalLocator$Shared
>> ConfigurationRunnable.run(InternalLocator.java:649)
>>
>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>>
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>>
>>         at org.apache.geode.distributed.internal.DistributionManager.ru
>> nUntilShutdown(DistributionManager.java:621)
>>
>>         at org.apache.geode.distributed.internal.DistributionManager$4$
>> 1.run(DistributionManager.java:878)
>>
>>         at java.lang.Thread.run(Thread.java:745)
>>
>> Caused by: org.apache.geode.cache.persistence.ConflictingPersistentDataException:
>> Region /_ConfigurationRegion refusing to initialize from member
>> member001(Locator1:5160:locator)<ec><v0>:1024 with persistent data /
>> *member001.IP.MASKED*:/local/apps/shared/geode/members/Locato
>> r1/work/ConfigDiskDir_RavenLocator1 created at timestamp 1496241336712
>> version 0 diskStoreId 31efa18230134865-b4fd0fcbde63ade6 name
>> RavenLocator1 which was offline when the local data from /
>> *member002.IP.MASKED*:/local/apps/shared/geode/members/Locator2/work/ConfigDiskDir_Locator2
>> created at timestamp 1496241344046 version 0 diskStoreId
>> df94511d0f3d4295-91ec9286a18aaa75 name Locator2 was last online
>>
>>         at org.apache.geode.internal.cache.persistence.PersistenceAdvis
>> orImpl.checkMyStateOnMembers(PersistenceAdvisorImpl.java:751)
>>
>>         at org.apache.geode.internal.cache.persistence.PersistenceAdvis
>> orImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:812)
>>
>>         at org.apache.geode.internal.cache.persistence.CreatePersistent
>> RegionProcessor.getInitialImageAdvice(CreatePe
>> rsistentRegionProcessor.java:52)
>>
>>         at org.apache.geode.internal.cache.DistributedRegion.getInitial
>> ImageAndRecovery(DistributedRegion.java:1267)
>>
>>         at org.apache.geode.internal.cache.DistributedRegion.initialize
>> (DistributedRegion.java:1101)
>>
>>         at org.apache.geode.internal.cache.GemFireCacheImpl.createVMReg
>> ion(GemFireCacheImpl.java:3308)
>>
>>         at org.apache.geode.distributed.internal.ClusterConfigurationSe
>> rvice.getConfigurationRegion(ClusterConfigurationService.java:709)
>>
>>         ... 7 more
>>
>>
>>
>> Thanks & Regards,
>>
>> Dharam
>>
>> This message is confidential and subject to terms at: http://
>> www.jpmorgan.com/emaildisclaimer including on confidentiality, legal
>> privilege, viruses and monitoring of electronic messages. If you are not
>> the intended recipient, please delete this message and notify the sender
>> immediately. Any unauthorized use is strictly prohibited.
>>
>
>
>
> --
> Cheers
>
> Jinmei
>

Re: How to deal with cluster configuration service failure

Posted by Jinmei Liao <ji...@pivotal.io>.

Is this related to https://issues.apache.org/jira/browse/GEODE-3003?

On Sun, Jun 4, 2017 at 11:39 PM, Thacker, Dharam <
dharam.thacker@jpmorgan.com> wrote:

> Hi Team,
>
>
>
> Could someone help to understand how to deal with below scenario where
> cluster configuration service fails to start in another locator? Which
> supportive action should we take to rectify this?
>
>
>
> *Note*:
>
> member001.IP.MAKSED – IP address of member001
>
> member002.IP.MASKED – IP address of member002
>
>
>
> *Locator logs on member002:*
>
>
>
> [info 2017/06/05 02:07:11.941 EDT RavenLocator2 <Pooled Message Processor
> 1> tid=0x3d] Initializing region _ConfigurationRegion
>
>
>
> [warning 2017/06/05 02:07:11.951 EDT Locator2 <Pooled Message Processor 1>
> tid=0x3d] Initialization failed for Region /_ConfigurationRegion
>
> org.apache.geode.cache.persistence.ConflictingPersistentDataException:
> Region /_ConfigurationRegion refusing to initialize from member
> member001(Locator1:5160:locator)<ec><v0>:1024 with persistent data
> /169.87.179.46:/local/apps/shared/geode/members/Locator1/work/ConfigDiskDir_Locator1
> created at timestamp 1496241336712 version 0 diskStoreId 31efa18230134865-b4fd0fcbde63ade6
> name Locator1 which was offline when the local data from /
> *member002.IP.MASKED*:/local/apps/shared/geode/members/
> Locator2/work/ConfigDiskDir_Locator2 created at timestamp 1496241344046
> version 0 diskStoreId df94511d0f3d4295-91ec9286a18aaa75 name Locator2 was
> last online
>
>         at org.apache.geode.internal.cache.persistence.
> PersistenceAdvisorImpl.checkMyStateOnMembers(PersistenceAdvisorImpl.java:
> 751)
>
>         at org.apache.geode.internal.cache.persistence.
> PersistenceAdvisorImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:
> 812)
>
>         at org.apache.geode.internal.cache.persistence.
> CreatePersistentRegionProcessor.getInitialImageAdvice(
> CreatePersistentRegionProcessor.java:52)
>
>         at org.apache.geode.internal.cache.DistributedRegion.
> getInitialImageAndRecovery(DistributedRegion.java:1267)
>
>         at org.apache.geode.internal.cache.DistributedRegion.
> initialize(DistributedRegion.java:1101)
>
>         at org.apache.geode.internal.cache.GemFireCacheImpl.
> createVMRegion(GemFireCacheImpl.java:3308)
>
>         at org.apache.geode.distributed.internal.
> ClusterConfigurationService.getConfigurationRegion(
> ClusterConfigurationService.java:709)
>
>         at org.apache.geode.distributed.internal.
> ClusterConfigurationService.initSharedConfiguration(
> ClusterConfigurationService.java:426)
>
>         at org.apache.geode.distributed.internal.InternalLocator$
> SharedConfigurationRunnable.run(InternalLocator.java:649)
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>
>         at org.apache.geode.distributed.internal.DistributionManager.
> runUntilShutdown(DistributionManager.java:621)
>
>         at org.apache.geode.distributed.internal.DistributionManager$
> 4$1.run(DistributionManager.java:878)
>
>         at java.lang.Thread.run(Thread.java:745)
>
>
>
> [error 2017/06/05 02:07:11.959 EDT Locator2 <Pooled Message Processor 1>
> tid=0x3d] Error occurred while initializing cluster configuration
>
> java.lang.RuntimeException: Error occurred while initializing cluster
> configuration
>
>         at org.apache.geode.distributed.internal.
> ClusterConfigurationService.getConfigurationRegion(
> ClusterConfigurationService.java:722)
>
>         at org.apache.geode.distributed.internal.
> ClusterConfigurationService.initSharedConfiguration(
> ClusterConfigurationService.java:426)
>
>         at org.apache.geode.distributed.internal.InternalLocator$
> SharedConfigurationRunnable.run(InternalLocator.java:649)
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>
>         at org.apache.geode.distributed.internal.DistributionManager.
> runUntilShutdown(DistributionManager.java:621)
>
>         at org.apache.geode.distributed.internal.DistributionManager$
> 4$1.run(DistributionManager.java:878)
>
>         at java.lang.Thread.run(Thread.java:745)
>
> Caused by: org.apache.geode.cache.persistence.
> ConflictingPersistentDataException: Region /_ConfigurationRegion refusing
> to initialize from member member001(Locator1:5160:locator)<ec><v0>:1024
> with persistent data /*member001.IP.MASKED*:/local/
> apps/shared/geode/members/Locator1/work/ConfigDiskDir_RavenLocator1
> created at timestamp 1496241336712 version 0 diskStoreId 31efa18230134865-b4fd0fcbde63ade6
> name RavenLocator1 which was offline when the local data from /
> *member002.IP.MASKED*:/local/apps/shared/geode/members/
> Locator2/work/ConfigDiskDir_Locator2 created at timestamp 1496241344046
> version 0 diskStoreId df94511d0f3d4295-91ec9286a18aaa75 name Locator2 was
> last online
>
>         at org.apache.geode.internal.cache.persistence.
> PersistenceAdvisorImpl.checkMyStateOnMembers(PersistenceAdvisorImpl.java:
> 751)
>
>         at org.apache.geode.internal.cache.persistence.
> PersistenceAdvisorImpl.getInitialImageAdvice(PersistenceAdvisorImpl.java:
> 812)
>
>         at org.apache.geode.internal.cache.persistence.
> CreatePersistentRegionProcessor.getInitialImageAdvice(
> CreatePersistentRegionProcessor.java:52)
>
>         at org.apache.geode.internal.cache.DistributedRegion.
> getInitialImageAndRecovery(DistributedRegion.java:1267)
>
>         at org.apache.geode.internal.cache.DistributedRegion.
> initialize(DistributedRegion.java:1101)
>
>         at org.apache.geode.internal.cache.GemFireCacheImpl.
> createVMRegion(GemFireCacheImpl.java:3308)
>
>         at org.apache.geode.distributed.internal.
> ClusterConfigurationService.getConfigurationRegion(
> ClusterConfigurationService.java:709)
>
>         ... 7 more
>
>
>
> Thanks & Regards,
>
> Dharam
>
> This message is confidential and subject to terms at: http://
> www.jpmorgan.com/emaildisclaimer including on confidentiality, legal
> privilege, viruses and monitoring of electronic messages. If you are not
> the intended recipient, please delete this message and notify the sender
> immediately. Any unauthorized use is strictly prohibited.
>



-- 
Cheers

Jinmei