You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2014/12/04 00:30:12 UTC

[jira] [Commented] (HBASE-12626) Archive cleaner cannot keep up; it maxes out at about 400k deletes/hour

    [ https://issues.apache.org/jira/browse/HBASE-12626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233669#comment-14233669 ] 

stack commented on HBASE-12626:
-------------------------------

Ideas:

+ Rather than a single thread doing a delete every 9ms, run multiple threads to do deletes in // either in Master process or by farming out deletes so RS ran them.
+ Add bulk delete function to NN so we could pass it batches to remove.
+ Files are grouped by column family and then by regions. When lots of files, delete whole region dirs rather than individual files if all past the TTL (this would give us a bit of NN 'batching').  See below:

{code}
drwxr-xr-x   - stack supergroup          0 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList
drwxr-xr-x   - stack supergroup          0 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/17ac14014ffaf06eec8f4ca8f8a8093c
drwxr-xr-x   - stack supergroup          0 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/17ac14014ffaf06eec8f4ca8f8a8093c/meta
-rw-r--r--   3 stack supergroup   53257814 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/17ac14014ffaf06eec8f4ca8f8a8093c/meta/82b3d9f0ed0e4787a75036cf3e3e7165
-rw-r--r--   3 stack supergroup   53144343 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/17ac14014ffaf06eec8f4ca8f8a8093c/meta/8e3652741bc044bb8de90d3da44e73da
-rw-r--r--   3 stack supergroup  155406596 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/17ac14014ffaf06eec8f4ca8f8a8093c/meta/c81d8bd73eba41049a24065fe8a92335
-rw-r--r--   3 stack supergroup   40131260 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/17ac14014ffaf06eec8f4ca8f8a8093c/meta/e10bdea983744cbe9499a7f2a1ac73af
drwxr-xr-x   - stack supergroup          0 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/1addf0a136f3a461323dad87a5229838
drwxr-xr-x   - stack supergroup          0 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/1addf0a136f3a461323dad87a5229838/meta
-rw-r--r--   3 stack supergroup   53229613 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/1addf0a136f3a461323dad87a5229838/meta/2928456b642d4bed8432ff215c3adc28
-rw-r--r--   3 stack supergroup   53134686 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/1addf0a136f3a461323dad87a5229838/meta/515453c6dbd54e228932dd1ca4a3cd66
-rw-r--r--   3 stack supergroup  155390925 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/1addf0a136f3a461323dad87a5229838/meta/61d591035d6f4950a0fd974223e1defb
-rw-r--r--   3 stack supergroup   40082545 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/1addf0a136f3a461323dad87a5229838/meta/a03366c8862143d4bf77b81a83c73872
drwxr-xr-x   - stack supergroup          0 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/4fb408938e6e26021159b6f66239edf3
drwxr-xr-x   - stack supergroup          0 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/4fb408938e6e26021159b6f66239edf3/meta
-rw-r--r--   3 stack supergroup   53305889 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/4fb408938e6e26021159b6f66239edf3/meta/1132adf2d4a2422f8781502e1055d6d1
-rw-r--r--   3 stack supergroup   53146535 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/4fb408938e6e26021159b6f66239edf3/meta/2e988c9287a0412194f1f9050ded934b
-rw-r--r--   3 stack supergroup   40005595 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/4fb408938e6e26021159b6f66239edf3/meta/3fdd18de2eb2488f84f90846c0f6f578
-rw-r--r--   3 stack supergroup  155409996 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/4fb408938e6e26021159b6f66239edf3/meta/84750360a0bc463c9fa8023355da37ad
drwxr-xr-x   - stack supergroup          0 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/967b45186c5cf3e8e61ad9babd61768e
drwxr-xr-x   - stack supergroup          0 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/967b45186c5cf3e8e61ad9babd61768e/meta
-rw-r--r--   3 stack supergroup   36320770 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/967b45186c5cf3e8e61ad9babd61768e/meta/6d4baec2118b4aa482e283f4972f4b65
-rw-r--r--   3 stack supergroup  157346642 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/967b45186c5cf3e8e61ad9babd61768e/meta/7b1e80e2c26848e3bdb43abc7377bd42
-rw-r--r--   3 stack supergroup   55010361 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/967b45186c5cf3e8e61ad9babd61768e/meta/8f91251906c0407ba0d26f6c5e78baef
-rw-r--r--   3 stack supergroup   53156631 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/967b45186c5cf3e8e61ad9babd61768e/meta/d51d9d2f7fab407eb196b08555304ffc
drwxr-xr-x   - stack supergroup          0 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/e8441872044807da79dab2130c1337b2
drwxr-xr-x   - stack supergroup          0 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/e8441872044807da79dab2130c1337b2/meta
-rw-r--r--   3 stack supergroup   40012654 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/e8441872044807da79dab2130c1337b2/meta/30395d1e04b2427aa9665817b85ca593
-rw-r--r--   3 stack supergroup   53248791 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/e8441872044807da79dab2130c1337b2/meta/36785dda11f048c2ab4215329aed189d
-rw-r--r--   3 stack supergroup  155438135 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/e8441872044807da79dab2130c1337b2/meta/5e811190c0b546598ce46f470a5d0d17
-rw-r--r--   3 stack supergroup   53139133 2014-12-03 12:46 /hbase/archive/data/default/IntegrationTestBigLinkedList/e8441872044807da79dab2130c1337b2/meta/bb1b9c541bde41f7a2c287c62e0e3caa
{code}

> Archive cleaner cannot keep up; it maxes out at about 400k deletes/hour
> -----------------------------------------------------------------------
>
>                 Key: HBASE-12626
>                 URL: https://issues.apache.org/jira/browse/HBASE-12626
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scaling
>    Affects Versions: 0.94.25
>            Reporter: stack
>            Assignee: stack
>            Priority: Critical
>
> On big clusters, it is possible to overrun the archive cleaning thread.  Make it able to do more work per cycle when needed.
> We saw this on a user's cluster. The rate at which files were being moved to the archive exceeded our delete rate such that the archive had tens of millions of files putting a friction on all cluster ops.
> The cluster had ~500 nodes.  It that was RAM constrained (other processes on box also need RAM). Over a period of days, the loading was thrown off kilter because it started taking double writes going from one schema to another (Cluster was running hot before the double loading).  The master was deleting an archived file every 9ms on average, about 400k deletes an hour.  The constrained RAM and their having 4-5 column famiilies had them creating files in excess of this rate so we backed up.
> For some helpful background/input, see the dev thread http://search-hadoop.com/m/DHED4UYSF9



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)