You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Viraj Jasani (Jira)" <ji...@apache.org> on 2019/12/08 17:41:00 UTC

[jira] [Commented] (HBASE-22457) Harden the HBase HFile reader reference counting

    [ https://issues.apache.org/jira/browse/HBASE-22457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990932#comment-16990932 ] 

Viraj Jasani commented on HBASE-22457:
--------------------------------------

{quote} # detect run-away refCount by comparing any reader's refCount with the actual number of open scanners (which we track for each HRegion) if the refCount is larger we know we have a problem.
 # (a variation) when we attempt to archive an HFile that has refCount, check if there're any open scanners, if not archive anyway.

For #1 at least we could enhance the logging and include the number of currently scanners in the log (where we say that we cannot archive an HFile)
{quote}
For #1, since open scanners are tracked at HRegion(RegionScannerImpl) and not at Store level, we might not be able to compare refCount with open scanners? Also, #2 might also not be true due to open scanners at region level (non-compacted store files) and not at store level? I was thinking if we can also track no of open scanners at store level.

I was just going though comments here while looking into HBASE-23349 (refCount 1 preventing archival of compacted store files)

> Harden the HBase HFile reader reference counting
> ------------------------------------------------
>
>                 Key: HBASE-22457
>                 URL: https://issues.apache.org/jira/browse/HBASE-22457
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Lars Hofhansl
>            Priority: Major
>         Attachments: 22457-random-1.5.txt
>
>
> The problem that any coprocessor hook that replaces a passed scanner without closing it can cause an incorrect reference count.
> This was bad and wrong before of course, but now it has pretty bad consequences, since an incorrect reference could will prevent HFiles from being archived indefinitely.
> All hooks that are passed a scanner and return a scanner are suspect, since the returned scanner may or may not close the passed scanner:
> * preCompact
> * preCompactScannerOpen
> * preFlush
> * preFlushScannerOpen
> * preScannerOpen
> * preStoreScannerOpen
> * preStoreFileReaderOpen...? (not sure about this one, it could mess with the reader)
> I sampled the Phoenix and also Tephra code, and found a few instances where this is happening.
> And for those I filed issued: TEPHRA-300, PHOENIX-5291
> (We're not using Tephra)
> The Phoenix ones should be rare. In our case we are seeing readers with refCount > 1000.
> Perhaps there are other issues, a path where not all exceptions are caught and scanner is left open that way perhaps. (Generally I am not a fan of reference counting in complex systems - it's too easy to miss something. But that's a different discussion. :) ).
> Let's brainstorm some way in which we can harden this.
> [~ram_krish], [~anoop.hbase], [~apurtell]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)