You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Sammi Chen (Jira)" <ji...@apache.org> on 2020/06/03 02:26:00 UTC

[jira] [Updated] (HDDS-3658) Stop persist container related pipeline info of each key into OM DB to reduce DB size

     [ https://issues.apache.org/jira/browse/HDDS-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sammi Chen updated HDDS-3658:
-----------------------------
    Summary: Stop persist container related pipeline info of each key into OM DB to reduce DB size  (was: Remove container location information when persist key info into OM DB to reduce meta data db size)

> Stop persist container related pipeline info of each key into OM DB to reduce DB size
> -------------------------------------------------------------------------------------
>
>                 Key: HDDS-3658
>                 URL: https://issues.apache.org/jira/browse/HDDS-3658
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>            Reporter: Sammi Chen
>            Assignee: Sammi Chen
>            Priority: Major
>
> An investigation result of serilized key size, RATIS with three replica.  Following examples are quoted from the output of the "ozone sh key info" command which doesn't show related pipeline information for each key location element. 
> 1.  empty key,  serilized size 113 bytes
> hadoop/bucket/user/root/terasort/10G-input-7/_SUCCESS
> {
>   "volumeName" : "hadoop",
>   "bucketName" : "bucket",
>   "name" : "user/root/terasort/10G-input-7/_SUCCESS",
>   "dataSize" : 0,
>   "creationTime" : "2019-11-21T13:53:11.330Z",
>   "modificationTime" : "2019-11-21T13:53:11.361Z",
>   "replicationType" : "RATIS",
>   "replicationFactor" : 3,
>   "ozoneKeyLocations" : [ ],
>   "metadata" : { },
>   "fileEncryptionInfo" : null
> }
> 2.  key with one chunk data, serilized size 661 bytes
> hadoop/bucket/user/root/terasort/10G-input-6/part-m-00037
> {
>   "volumeName" : "hadoop",
>   "bucketName" : "bucket",
>   "name" : "user/root/terasort/10G-input-6/part-m-00037",
>   "dataSize" : 223696200,
>   "creationTime" : "2019-11-18T07:47:58.254Z",
>   "modificationTime" : "2019-11-18T07:53:52.066Z",
>   "replicationType" : "RATIS",
>   "replicationFactor" : 3,
>   "ozoneKeyLocations" : [ {
>     "containerID" : 7,
>     "localID" : 103157811003588713,
>     "length" : 223696200,
>     "offset" : 0
>   } ],
>   "metadata" : { },
>   "fileEncryptionInfo" : null
> }
> 3. key with two chunk data, serilized size 1205 bytes,
> ozone sh key info hadoop/bucket/user/root/terasort/10G-input-7/part-m-00027
> {
>   "volumeName" : "hadoop",
>   "bucketName" : "bucket",
>   "name" : "user/root/terasort/10G-input-7/part-m-00027",
>   "dataSize" : 223696200,
>   "creationTime" : "2019-11-21T13:47:07.653Z",
>   "modificationTime" : "2019-11-21T13:53:07.964Z",
>   "replicationType" : "RATIS",
>   "replicationFactor" : 3,
>   "ozoneKeyLocations" : [ {
>     "containerID" : 221,
>     "localID" : 103176210196201501,
>     "length" : 134217728,
>     "offset" : 0
>   }, {
>     "containerID" : 222,
>     "localID" : 103176231767375926,
>     "length" : 89478472,
>     "offset" : 0
>   } ],
>   "metadata" : { },
>   "fileEncryptionInfo" : null
> }
> When client reads a key, there is "refreshPipeline" option to control whether to get the up-to-date container location infofrom SCM. 
> Currently, this option is always set to true, which makes  saved container location info in OM DB useless. 
> Another motivation is when using Nanda's tool for the OM performance test,  with 1000 millions(1Billion) keys, each key with 1 replica, 2 chunk meta data, the total rocks DB directory size is 65.5GB.  One of our customer cluster has the requirement to save 10 Billion objects.  In this case ,the DB size is approximately (65.5GB * 10 * /2 * 3 )~ 1TB. 
> The goal of this task is going to discard the container location info when persist key to OM DB to save the DB space.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org