You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Hudson (Jira)" <ji...@apache.org> on 2021/12/18 14:54:00 UTC

[jira] [Commented] (HBASE-26488) Memory leak when MemStore retry flushing

    [ https://issues.apache.org/jira/browse/HBASE-26488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461911#comment-17461911 ] 

Hudson commented on HBASE-26488:
--------------------------------

Results for branch branch-2.5
	[build #6 on builds.a.o|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.5/6/]: (x) *{color:red}-1 overall{color}*
----
details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.5/6/General_20Nightly_20Build_20Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.5/6/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.5/6/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 report|https://ci-hadoop.apache.org/job/HBase/job/HBase%20Nightly/job/branch-2.5/6/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Memory leak when MemStore retry flushing
> ----------------------------------------
>
>                 Key: HBASE-26488
>                 URL: https://issues.apache.org/jira/browse/HBASE-26488
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 3.0.0-alpha-1, 2.4.8
>            Reporter: chenglei
>            Assignee: chenglei
>            Priority: Major
>              Labels: regionserver
>             Fix For: 2.5.0, 3.0.0-alpha-3, 2.4.10
>
>
> {{HStore.flushCache}} use following {{DefaultStoreFlusher.flushSnapshot}} to flush memStore and retry this method when this method throws exception(the default retry count is 10),but increasing reference count of {{MemStoreLABImpl}} which used by {{DefaultMemStore.snapshot}} in following line 55 and decreasing reference count of {{MemStoreLABImpl}} in following line 82 is asymmetrical:
> {code:java}
> 47 public List<Path> flushSnapshot(MemStoreSnapshot snapshot, long cacheFlushId,
> 48      MonitoredTask status, ThroughputController throughputController,
> 49      FlushLifeCycleTracker tracker) throws IOException {
> 50    ArrayList<Path> result = new ArrayList<>();
> 51   int cellsCount = snapshot.getCellsCount();
> 52    if (cellsCount == 0) return result; // don't flush if there are no entries
> 53
> 54    // Use a store scanner to find which rows to flush.
> 55    InternalScanner scanner = createScanner(snapshot.getScanners(), tracker);
> 56    StoreFileWriter writer;
> 57    try {
> 58      // TODO:  We can fail in the below block before we complete adding this flush to
> 59      //        list of store files.  Add cleanup of anything put on filesystem if we fail.
> 60      synchronized (flushLock) {
> 61        status.setStatus("Flushing " + store + ": creating writer");
> 62       // Write the map out to the disk
> 63        writer = store.createWriterInTmp(cellsCount,
> 64            store.getColumnFamilyDescriptor().getCompressionType(), false, true,
> 65           snapshot.isTagsPresent(), false);
> 66        IOException e = null;
> 67        try {
> 68          performFlush(scanner, writer, throughputController);
> 69        } catch (IOException ioe) {
> 70          e = ioe;
> 71          // throw the exception out
> 72          throw ioe;
> 73        } finally {
> 74          if (e != null) {
> 75            writer.close();
> 76          } else {
> 77            finalizeWriter(writer, cacheFlushId, status);
> 78          }
> 79        }
> 80      }
> 81    } finally {
> 82      scanner.close();
> 83    }
> ......
> {code}
> The asymmetry because above line 55 {{snapshot.getScanners}} ,which is following {{MemStoreSnapshot.getScanners}}:
> {code:java}
>        public List<KeyValueScanner> getScanners() {
>                return scanners;
>        }
> {code}
> {{MemStoreSnapshot.getScanners}} does not create new {{SegmentScanner}}, instead it reuses the  {{SegmentScanner}} in following line 43 created in {{MemStoreSnapshot}} ctor:
> {code:java}
> 38 public MemStoreSnapshot(long id, ImmutableSegment snapshot) {
> 39    this.id = id;
> 40    this.cellsCount = snapshot.getCellsCount();
> 41    this.memStoreSize = snapshot.getMemStoreSize();
> 42    this.timeRangeTracker = snapshot.getTimeRangeTracker();
> 43    this.scanners = snapshot.getSnapshotScanners();
> 44    this.tagsPresent = snapshot.isTagsPresent();
> 45 }
> {code}
> So when flushing MemStore by {{DefaultStoreFlusher.flushSnapshot}} at first time, the reference count of {{MemStoreLABImpl}}  which used by {{DefaultMemStore.snapshot}} is 1, and if {{DefaultStoreFlusher.flushSnapshot}} throws exception, the {{SegmentScanner}} of {{DefaultMemStore.snapshot}}  is closed at
> the  end of {{DefaultStoreFlusher.flushSnapshot}} , and the reference count of corresponding {{MemStoreLABImpl}} is decreased to 0.
> When retry flushing {{MemStore}} by {{DefaultStoreFlusher.flushSnapshot}} at second time, if it success, the {{SegmentScanner}} of {{DefaultMemStore.snapshot}}  is closed again at the  end of {{DefaultStoreFlusher.flushSnapshot}} , and the reference count of corresponding {{MemStoreLABImpl}} is decreased to -1. When flush is completed, {{DefaultMemStore.clearSnapshot}} is called to close {{DefaultMemStore.snapshot}} and corresponding {{MemStoreLABImpl}} , but in the following line 268, reference count is -1 not 0, so  {{MemStoreLABImpl.recycleChunks}} is not called and the Chunks used by this {{MemStoreLABImpl}}  is leaked.
> {code:java}
> 261  public void close() {
> 262    if (!this.closed.compareAndSet(false, true)) {
> 263      return;
> 264    }
> 265    // We could put back the chunks to pool for reusing only when there is no
> 266    // opening scanner which will read their data
> 267    int count  = openScannerCount.get();
> 268    if(count == 0) {
> 269      recycleChunks();
> 270    }
> 271  }
> {code}
> My fix is for {{MemStoreSnapshot.getScanners}}, we call {{snapshot.getSnapshotScanners}} to create a new {{SegmentScanner}}, and we do not create {{SegmentScanner}} in {{MemStoreSnapshot}} ctor.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)