You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ozone.apache.org by "Wei-Chiu Chuang (Jira)" <ji...@apache.org> on 2020/10/28 18:36:00 UTC

[jira] [Commented] (HDDS-4395) Ozone Data Generator for Fast Scale Test

    [ https://issues.apache.org/jira/browse/HDDS-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17222341#comment-17222341 ] 

Wei-Chiu Chuang commented on HDDS-4395:
---------------------------------------

The code is currently in my personal repo: https://github.com/jojochuang/hadoop-ozone/tree/containergen
Will rebase the code against master and then open a PR later.

> Ozone Data Generator for Fast Scale Test
> ----------------------------------------
>
>                 Key: HDDS-4395
>                 URL: https://issues.apache.org/jira/browse/HDDS-4395
>             Project: Hadoop Distributed Data Store
>          Issue Type: New Feature
>          Components: Tools
>    Affects Versions: 1.0.0
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>            Priority: Major
>         Attachments: Ozone Data Generator for Fast Scale Test.pdf
>
>
> I've been working on this fun project and would like to share with the community.
>  
> h1. Synopsis
> We want to prove Ozone runs well at scale, in terms of number of keys (billions of keys), as well as dense DataNodes where each DN has hundreds of TB or even PB-scale capacity.
> h1. Challenge: Data generation
> The challenge is to generate a huge data set fast so that we can benchmark the system quickly. No existing tool is capable at this scale. 
>  
> h1. Proposal:
> The major bottleneck is OM’s key insertion performance. In addition, Ozone uses a single pipeline to write data, unless multi-raft is enabled.
>  
> Instead of using Ozone's client API to generate data, We should write directly to OM, SCM and DN’s rocksdb. RocksDB can support u[p to a million key|https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks] bulk load operations.
>  
> Similarly, we can skip the normal Ozone client write path; populate the container db and block files directly.
>  
> (more details in the design doc)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org