You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "chenglei (Jira)" <ji...@apache.org> on 2021/11/19 03:38:00 UTC

[jira] [Assigned] (HBASE-26465) MemStoreLAB may be released early when its SegmentScanner is scanning

     [ https://issues.apache.org/jira/browse/HBASE-26465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

chenglei reassigned HBASE-26465:
--------------------------------

    Assignee: chenglei

> MemStoreLAB may be released early when its SegmentScanner is scanning
> ---------------------------------------------------------------------
>
>                 Key: HBASE-26465
>                 URL: https://issues.apache.org/jira/browse/HBASE-26465
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 3.0.0-alpha-1, 2.4.8
>            Reporter: chenglei
>            Assignee: chenglei
>            Priority: Critical
>
> HBASE-26144 moved {{MemStore.clearSnapshot}} out of write lock of {{HStore#lock}}, but because {{MemStore.clearSnapshot}}  closed {{DefaultMemStore#snapshot}} which may be used by {{DefaultMemStore#getScanners}}, when {{DefaultMemStore#getScanners}} and {{MemStore}} flushing execute conurrently, {{MemStoreLAB}} used by {{DefaultMemStore#snapshot}}  may be released early when its {{SegmentScanner}} is scanning.
> Considering follow thread sequences: 
> 1.The flush thread starts {{DefaultMemStore}} flushing after some cells have be added to {{DefaultMemStore}}.
> 2.The flush thread stopping before {{DefaultMemStore#clearSnapshot}} in
>    {{HStore#updateStorefiles}} after completed flushing memStore to hfile.
> 3.The scan thread starts and stopping after {{DefaultMemStore.snapshot.getAllSegments}} in
>    {{DefaultMemStore#getScanners}},here the scan thread gets the {{DefaultMemStore#snapshot}} which is created by the flush thread.
> 4.The flush thread continues {{DefaultMemStore#clearSnapshot}} and close {{DefaultMemStore#snapshot}},because the reference count of the corresponding  {{MemStoreLABImpl}} is 0, the Chunks in corresponding {{MemStoreLABImpl}} are recycled.
> 5.The scan thread continues {{DefaultMemStore#getScanners}}, and create a {{SegmentScanner}} for this {{DefaultMemStore#snapshot}}, and increase the reference count of the corresponding {{MemStoreLABImpl}}, but {{Chunks}} in corresponding {{MemStoreLABImpl}} are recycled by step 4, and these {{Chunks}} may be overwritten by other write threads, which may cause serious problem.
> I simulate above case in my PR, and my Fix in the PR is as following:
> 1.Moving  {{MemStore.clearSnapshot}} back under write lock of {{HStore#lock}}, because {{DefaultMemStore#getScanners}} is protected under the read lock of {{HStore#lock}}, so {{DefaultMemStore#getScanners}} and
>    {{DefaultMemStore#clearSnapshot}} could not execute concurrently.
> 2. Introducing {{RefCnt}} into {{MemStoreLABImpl}}  to replace old reference count, so checking and increasing or decreasing is done in atomicity and if there is illegal state in  reference count, it would throw exception rather than corrupt data.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)