You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Michael Stack (Jira)" <ji...@apache.org> on 2020/12/04 17:58:00 UTC
[jira] [Commented] (HBASE-24749) Direct insert HFiles and Persist in-memory HFile tracking

    [ https://issues.apache.org/jira/browse/HBASE-24749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17244166#comment-17244166 ] 

Michael Stack commented on HBASE-24749:
---------------------------------------

Took a look at the design. Looks good. Fun. Still talks of hbase:storefile table. Is that intentional? The design also proposes using zk as a persistent store (See invariants for devs in [http://hbase.apache.org/book.html#design.invariants).]

A bit more detail on RefreshHFilesClient would be appreciated... here or as amend to design. We'd run this tool to make the dfsclient listing view align with hbase's understanding?


The testing section is interesting. s3 is a predicate. Anything we can do to make sure that a check-in doesn't break hbase-on-s3? Perhaps we need to add to our nightly runs hbase on s3? Or, a PR that touches fs code runs the s3 tests?

Thanks

 

 

> Direct insert HFiles and Persist in-memory HFile tracking
> ---------------------------------------------------------
>
>                 Key: HBASE-24749
>                 URL: https://issues.apache.org/jira/browse/HBASE-24749
>             Project: HBase
>          Issue Type: Umbrella
>          Components: Compaction, HFile
>    Affects Versions: 3.0.0-alpha-1
>            Reporter: Tak-Lon (Stephen) Wu
>            Assignee: Tak-Lon (Stephen) Wu
>            Priority: Major
>              Labels: design, discussion, objectstore, storeFile, storeengine
>         Attachments: 1B100m-25m25m-performance.pdf, Apache HBase - Direct insert HFiles and Persist in-memory HFile tracking.pdf
>
>
> We propose a new feature (a new store engine) to remove the {{.tmp}} directory used in the commit stage for common HFile operations such as flush and compaction to improve the write throughput and latency on object stores. Specifically for S3 filesystems, this will also mitigate read-after-write inconsistencies caused by immediate HFiles validation after moving the HFile(s) to data directory.
> Please see attached for this proposal and the initial result captured with 25m (25m operations) and 1B (100m operations) YCSB workload A LOAD and RUN, and workload C RUN result.
> The goal of this JIRA is to discuss with the community if the proposed improvement on the object stores use case makes senses and if we miss anything should be included.
> Improvement Highlights
>  1. Lower write latency, especially the p99+
>  2. Higher write throughput on flush and compaction 
>  3. Lower MTTR on region (re)open or assignment 
>  4. Remove consistent check dependencies (e.g. DynamoDB) supported by file system implementation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)