You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Zheng Hu (JIRA)" <ji...@apache.org> on 2018/07/02 11:27:00 UTC

[jira] [Comment Edited] (HBASE-20789) TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky

    [ https://issues.apache.org/jira/browse/HBASE-20789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529732#comment-16529732 ] 

Zheng Hu edited comment on HBASE-20789 at 7/2/18 11:26 AM:
-----------------------------------------------------------

As [~Apache9] comment on RB,  there's problem here  in patch.v3:

{code}
443	    if (replaceExistingCacheBlock) {
444	      ramCache.put(cacheKey, re);
445	    } else if (ramCache.putIfAbsent(cacheKey, re) != null) {
446	      return;
447	    }
{code}

Can not just replace the cacheKey with new RAMQueueEntry, because  the heapSize of bucket cache need to update if removing entry from ramCache.  the WriterThread  write to io-engine firstly, then sync, then remove the RAMQueueEntry from ramCache.  It's possible that the removed entry is not the right one. 

{code}
t1.   thread0 try to cache block0 with key0 (BucketCache#cacheBlock)
t2.   replace it into ramCache; 
t3.   writer thread write to io-engine;
 // t4.    another thread1 try to cache block1 with same key0; (BucketCache#cacheBlock)
 // t5.    replace block0 with block1  in ramCache 
t5.   remove the entry (block1) with key0 from ramCache; 
{code}

Finally,the thread0 will remove the incorrect block1... the heap size is wrong also.. 

So for safety, we still keep the putIfAbsent() to ensure that only one thread will remove entry from ramCache...  the flaky ut has been fixed by waiting until the cache flushed to io-engine...  


was (Author: openinx):
As [~Apache9] comment on RB,  there's problem here  in patch.v3:

{code}
443	    if (replaceExistingCacheBlock) {
444	      ramCache.put(cacheKey, re);
445	    } else if (ramCache.putIfAbsent(cacheKey, re) != null) {
446	      return;
447	    }
{code}

Can not just replace the cacheKey with new RAMQueueEntry, because  the heapSize of bucket cache need to update if removing entry from ramCache.  the WriterThread  write to io-engine firstly, then sync, then remove the RAMQueueEntry from ramCache.  It's possible that the removed entry is not the right one. 

{code}
t1.   thread0 try to cache block0 with key0 (BucketCache#cacheBlock)
t2.   replace it into ramCache; 
t3.   writer thread write to io-engine;
                                                                                                  // t4.    another thread1 try to cache block1 with same key0; (BucketCache#cacheBlock)
                                                                                                  // t5.    replace block0 with block1  in ramCache 
t5.   remove the entry (block1) with key0 from ramCache; 
{code}

Finally,the thread0 will remove the incorrect block1... the heap size is wrong also.. 

So for safety, we still keep the putIfAbsent() to ensure that only one thread will remove entry from ramCache...  the flaky ut has been fixed by waiting until the cache flushed to io-engine...  

> TestBucketCache#testCacheBlockNextBlockMetadataMissing is flaky
> ---------------------------------------------------------------
>
>                 Key: HBASE-20789
>                 URL: https://issues.apache.org/jira/browse/HBASE-20789
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Zheng Hu
>            Assignee: Zheng Hu
>            Priority: Major
>             Fix For: 3.0.0, 2.1.0, 1.5.0, 1.4.6, 2.0.2
>
>         Attachments: 0001-HBASE-20789-TestBucketCache-testCacheBlockNextBlockM.patch, HBASE-20789.v1.patch, HBASE-20789.v2.patch, HBASE-20789.v3.patch, bucket-33718.out
>
>
> The UT failed frequently in our internal branch-2... Will dig into the UT.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)