You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Josh Elser (JIRA)" <ji...@apache.org> on 2018/07/26 20:53:00 UTC
[jira] [Commented] (HBASE-20962) LogStream Metadata Tracking

    [ https://issues.apache.org/jira/browse/HBASE-20962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558878#comment-16558878 ] 

Josh Elser commented on HBASE-20962:
------------------------------------

Posed this question to [~enis] in email. Let me try to paraphrase what he suggested:

[https://bookkeeper.apache.org/distributedlog/docs/0.4.0-incubating/user_guide/architecture/main.html#id3]

!https://bookkeeper.apache.org/distributedlog/docs/0.4.0-incubating/images/datamodel.png!

For the Distributed log data model, we have "log segments", a log stream is is a sequence of log-segments, and then the log stream belongs to a namespace.

For Ratis, we'd be looking at a "log segment" being one raft ring/quorum. The Ratis LogService would give HBase the LogStream API (abstracting away the "physical" data on disk) – one region would have one LogStream. All of the operations that HBase would want to do would be at the LogStream level, never the log-segment level.

I believe Enis was suggesting that we use rocksdb to manage the log-segments on a given RS (e.g. knowing how to construct readers/writers, how to truncate data), and then a metadata-level raft ring/quorum for knowing what logstreams exist on other nodes.

> LogStream Metadata Tracking
> ---------------------------
>
>                 Key: HBASE-20962
>                 URL: https://issues.apache.org/jira/browse/HBASE-20962
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Josh Elser
>            Priority: Major
>
> An open question is about how HBase would track these LogService-backed WALs.
> Presently, HBase uses server-names and a well-known directory in HDFS to know what WALs exist. Since we are not relying on HDFS (or a distributed filesystem), we need to come up with something else.
> [~sergey soldatov] made a good suggestion today which was that we could implement another Ratis StateMachine whose purpose was specifically designed to managing the state of LogStreams "in HBase". This information should be relatively "small" (WRT the amount of data in each LogStream), so we can avoid the kinds of problems described in HBASE-20961 around re-introducing a failed peer to the quorum. This is the best idea I've heard so far on the matter.
> The other obvious candidate would be ZooKeeper but this is probably a non-starter as it would be persistent data (which is an HBase anti-pattern).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)