You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2018/01/23 18:51:50 UTC

[GitHub] sijie commented on a change in pull request #1025: BP-26: Move the development of distributedlog library to bookkeeper

sijie commented on a change in pull request #1025: BP-26: Move the development of distributedlog library to bookkeeper
URL: https://github.com/apache/bookkeeper/pull/1025#discussion_r163340815
 
 

 ##########
 File path: site/bps/BP-26-move-distributedlog-core-library.md
 ##########
 @@ -0,0 +1,60 @@
+---
+title: "BP-26: Move distributedlog library as part of bookkeeper"
+issue: https://github.com/apache/bookkeeper/1024
+state: 'Under Discussion'
+release: "N/A"
+---
+
+### Motivation
+
+DistributedLog is an extension of Apache BookKeeper, which offers *reopenable* log streams as its storage primitives.
+It is tightly built over bookkeeper ledgers, and provides an easier-to-use abstraction and api to use. Applications
+can use *named* log streams rather than *numbered* ledgers to store their data. For example, users can use log streams
+as files to storge objects, checkpoints and other more general filesystem related use cases.
+
+Moving the distributedlog core library as part of bookkeeper would have following benefits:
+
+- It provides more generic "reopenable" log abstraction. It lowers the barrier for people to use bookkeeper to store
+  data, and bring in more use cases into bookkeeper ecosystem.
+- Using ledgers to build continous log stream has been a pattern that been reimplemented multiple times at multiple places,
+  from older projects like HDFS namenode log manager, Hedwig to the newer projects like DistributedLog and Pulsar.
+- Most of the distributedlog usage is using the distributedlog library which only depends Apache BookKeeper and there is no
+  additional components introduced. To simplify those usages, it is better to release distributedlog library along with
+  bookkeeper. It provides a better integration and release procedure.
+
+This proposal proposes "moving the distributedlog library code base as part of bookkeeper and continuing the library
+development in bookkeeper".
+
+### Public Interfaces
+
+This is a new library moved in bookkeeper. It will *NOT* touch any existing bookkeeper modules and ledger api.
+
+### Proposed Changes
+
+This proposal will *ONLY* move following library-only modules from distributedlog repo:
+
+- distributedlog-core: the log stream library that build over bookkeeper ledgr api. It doesn't introduce any service components. Library only.
+- distributedlog-io/dlfs: A hdfs filesystem api wrapper over the log stream api, to provide filesystem-like usage over bookkeeper.
+
+This proposal will *NOT* move other service components like "distributedlog-proxy".
 
 Review comment:
   @ivankelly no this proposal is only proposing moving the library not the proxy service for a few reasons. 1) what bookkeeper needs is a *distributedlog*-ish library to simplify the usage of bookkeeper. *ledger* api is not usable for most of the users. 2) moving the library is to mimic the impact of other bookkeeper users who don't need dlog. 3) the dlog proxy was written in twitter-finagle, which depends on a specialized libthrift which is only available in twitter maven. so whether to move dlog proxy or improve it with grpc is a broader discussion than just moving the dlog library. I would defer that in a later discussion.
   
   Hope this makes things clear.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services