You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geode.apache.org by "xiaojian zhou (JIRA)" <ji...@apache.org> on 2017/03/17 20:53:41 UTC

[jira] [Updated] (GEODE-2683) Lucene query did not match region values

     [ https://issues.apache.org/jira/browse/GEODE-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

xiaojian zhou updated GEODE-2683:
---------------------------------
    Description: 
There're several root causes. This one is due to the fix in #45782 changed the order to notify primary bucket's gateway before distribute to secondary. 

The log is at /export/buglogs_bvt/xzhou/lucene/concParRegHA-0209-235804
CLIENT vm_1_thr_17_dataStore1_ip-10-32-108-36_11189
TASK[1] parReg.ParRegTest.HydraTask_HADoEntryOps
ERROR util.TestException: util.TestException: Lucene query did not match region values. missingKeys=[], extraKeys=[Object_13, Object_17, Object_952, Object_550, Object_1876, Object_2732, Object_270, Object_4722, Object_4726, Object_2537]
at lucene.LuceneHelper.verifyLuceneIndex(LuceneHelper.java:88)
at lucene.LuceneTest.verifyLuceneIndex(LuceneTest.java:128)
at lucene.LuceneTest.verifyFromSnapshotOnly(LuceneTest.java:79)
at parReg.ParRegTest.verifyFromSnapshot(ParRegTest.java:5638)
at parReg.ParRegTest.concVerify(ParRegTest.java:6035)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at util.MethodCoordinator.executeOnce(MethodCoordinator.java:68)
at parReg.ParRegTest.HADoEntryOps(ParRegTest.java:2273)
at parReg.ParRegTest.HydraTask_HADoEntryOps(ParRegTest.java:1032)
at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)

The root cause is:
T1: A putAll (or removeAll. operation arrived at primary bucket at memberA
T2: BR.virtualPut() called handleWANEvent() and create shadow key
T3: PutAll will invoke callback (i.e. write into AEQ) before distribution. (Put/Destroy will not have this problem because they distribute before callback)
T4: handleSuccessfulBatchDispatch will send ParallelQueueRemovalMessage to the secondary bucket at memberB
T5: memberB has dataRegion's secondary bucket, but brq is not created yet (due to rebalance). So in ParallelQueueRemovalMessage.process(), it will only try to remove the event from tempQueue (which does not contain the event, so it will do nothing)
T6: Now, finally the BR.virtualPut()'s distribution arrived at user region's secondary bucket at memberB. It will be added into the AEQ (or tempQueue, depends). 
T7: memberB becomes new primary (due to rebalance) and re-dispatch the shadow key (which has been processed much earlier in memberA). Data mismatch is because the replayed event overrides a newer event.

  was:There're several root causes. This one is due to the fix in #45782 changed the order to notify primary bucket's gateway before distribute to secondary. 


> Lucene query did not match region values
> ----------------------------------------
>
>                 Key: GEODE-2683
>                 URL: https://issues.apache.org/jira/browse/GEODE-2683
>             Project: Geode
>          Issue Type: Bug
>            Reporter: xiaojian zhou
>            Assignee: xiaojian zhou
>             Fix For: 1.2.0
>
>
> There're several root causes. This one is due to the fix in #45782 changed the order to notify primary bucket's gateway before distribute to secondary. 
> The log is at /export/buglogs_bvt/xzhou/lucene/concParRegHA-0209-235804
> CLIENT vm_1_thr_17_dataStore1_ip-10-32-108-36_11189
> TASK[1] parReg.ParRegTest.HydraTask_HADoEntryOps
> ERROR util.TestException: util.TestException: Lucene query did not match region values. missingKeys=[], extraKeys=[Object_13, Object_17, Object_952, Object_550, Object_1876, Object_2732, Object_270, Object_4722, Object_4726, Object_2537]
> at lucene.LuceneHelper.verifyLuceneIndex(LuceneHelper.java:88)
> at lucene.LuceneTest.verifyLuceneIndex(LuceneTest.java:128)
> at lucene.LuceneTest.verifyFromSnapshotOnly(LuceneTest.java:79)
> at parReg.ParRegTest.verifyFromSnapshot(ParRegTest.java:5638)
> at parReg.ParRegTest.concVerify(ParRegTest.java:6035)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at util.MethodCoordinator.executeOnce(MethodCoordinator.java:68)
> at parReg.ParRegTest.HADoEntryOps(ParRegTest.java:2273)
> at parReg.ParRegTest.HydraTask_HADoEntryOps(ParRegTest.java:1032)
> at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> The root cause is:
> T1: A putAll (or removeAll. operation arrived at primary bucket at memberA
> T2: BR.virtualPut() called handleWANEvent() and create shadow key
> T3: PutAll will invoke callback (i.e. write into AEQ) before distribution. (Put/Destroy will not have this problem because they distribute before callback)
> T4: handleSuccessfulBatchDispatch will send ParallelQueueRemovalMessage to the secondary bucket at memberB
> T5: memberB has dataRegion's secondary bucket, but brq is not created yet (due to rebalance). So in ParallelQueueRemovalMessage.process(), it will only try to remove the event from tempQueue (which does not contain the event, so it will do nothing)
> T6: Now, finally the BR.virtualPut()'s distribution arrived at user region's secondary bucket at memberB. It will be added into the AEQ (or tempQueue, depends). 
> T7: memberB becomes new primary (due to rebalance) and re-dispatch the shadow key (which has been processed much earlier in memberA). Data mismatch is because the replayed event overrides a newer event.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)