You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Alan Gates (JIRA)" <ji...@apache.org> on 2014/10/01 03:17:35 UTC

[jira] [Commented] (HIVE-8258) Compactor cleaners can be starved on a busy table or partition.

    [ https://issues.apache.org/jira/browse/HIVE-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154168#comment-14154168 ] 

Alan Gates commented on HIVE-8258:
----------------------------------

bq. I don't think this is the right map to use here
Yes it is.  I'm testing if I already have an entry in the map of the compaction id to the set of associated locks.  If not, I want to go build that entry.

On point 4 (doing remove files at the same time that some reader is doing AcidUtils.getAcidState), good catch.  Looking through AcidUtils.getAcidState I think everything will be ok accept the call to findOriginals().  Except for that, it does one call to FileSystem.listLocatedStatus, which should return coherent results (either the to be deleted files will be there or not).  After that it just operates on the return status structures, which shouldn't cause any issues.  And by definition these files won't be chosen to be read from, so even if AcidUtils.getAcidState sees them and they immediately vanish that will be fine.

But, I think there is an issue in the call to findOriginals.  It recalls listLocatedStatus because it has to recurse down to find the bucket files.  If the directory is removed between the two calls to listLocatedStatus then the second one will throw an IOException.  This won't be caught and will fly all the way out of getAcidState, crashing the task.

We could wrap the second call to listLocatedStatus in findOriginals in a try/catch.  This will have the downside of potentially swallowing real errors.  But I don't see a better option.  [~owen.omalley], thoughts?

> Compactor cleaners can be starved on a busy table or partition.
> ---------------------------------------------------------------
>
>                 Key: HIVE-8258
>                 URL: https://issues.apache.org/jira/browse/HIVE-8258
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 0.13.1
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>            Priority: Critical
>         Attachments: HIVE-8258.patch
>
>
> Currently the cleaning thread in the compactor does not run on a table or partition while any locks are held on this partition.  This leaves it open to starvation in the case of a busy table or partition.  It only needs to wait until all locks on the table/partition at the time of the compaction have expired.  Any jobs initiated after that (and thus any locks obtained) will be for the new versions of the files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)