You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by Nicholas Nezis <ni...@gmail.com> on 2022/02/27 22:43:57 UTC

Code review request: Heron's use of Bookkeeper

Dear Bookkeeper devs,

I was wondering if anyone would be willing to review Apache Heron's use of
Bookkeeper as a storage mechanism. The choice to use Bookkeeper predated my
joining the team, and I worry that we are not properly using it.

Apache Heron is a streaming analytic framework which has primarily two uses
of BK in Heron.

1. Uploader/Downloader. When analytics are submitted to a Heron cluster,
the binary artifacts are uploaded to Bookkeeper. Then when the analytic
Statefulset is created, each pod downloads the binary artifact from
Bookkeeper.
  a. Uploader:
https://github.com/apache/incubator-heron/tree/master/heron/uploaders/src/java/org/apache/heron/uploader/dlog
  b. Downloader:
https://github.com/apache/incubator-heron/blob/master/heron/downloaders/src/java/org/apache/heron/downloader/DLDownloader.java

2. Stateful Storage: BK is used for storing checkpoint data which can be
retrieved for recovery.
https://github.com/apache/incubator-heron/blob/master/heron/statefulstorages/src/java/org/apache/heron/statefulstorage/dlog/DlogStorage.java

In addition to the code in which various size and time based rolling is
disabled. We have a few interesting config items that were added to address
a bug in which Bookkeeper was filling up. But I suspect these settings are
incorrect.
https://github.com/apache/incubator-heron/blob/ebd7ceaeb7cb4aeddf21e8a51a233d53e2afca0d/deploy/kubernetes/helm/values.yaml.template#L93-L95

Any assistance in reviewing our use of the distributedlog API would be
greatly appreciated.

Nick