You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Davide Giannella (JIRA)" <ji...@apache.org> on 2019/04/09 10:37:13 UTC

[jira] [Updated] (OAK-6254) DataStore: API to retrieve approximate storage size

     [ https://issues.apache.org/jira/browse/OAK-6254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Davide Giannella updated OAK-6254:
----------------------------------
    Fix Version/s: 1.14.0

> DataStore: API to retrieve approximate storage size
> ---------------------------------------------------
>
>                 Key: OAK-6254
>                 URL: https://issues.apache.org/jira/browse/OAK-6254
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: blob
>            Reporter: Thomas Mueller
>            Priority: Major
>             Fix For: 1.12.0, 1.14.0
>
>
> The estimated size of the datastore (on disk) is needed to:
> * monitor growth over time, or growth of certain operations
> * monitor if garbage collection is effective
> * avoid out of disk space
> * estimate backup size
> * statistical purposes (for example, if there are many repositories, to group them by size)
> Datastore size: we could use the following heuristic: We could read the file sizes in ./datastore/00/00 (if it exists) and multiply by 65536; or ./datastore/00 and multiply by 256. That would give a rough estimation (within about 20% for repositories with datastore size > 50 GB).
> I think this is mainly important for the FileDataStore. The S3 datastore, if there is a simple and fast S3 API to read the size, then that would be good as well, but if there is none, then returning "unknown" is fine for me.
> As for the API, I would use something like this: {{long getEstimatedStorageSize(int accuracyLevel)}} with accuracyLevel 1 for inaccurate (fastest), 2 more accurate (slower),..., 9 precise (possibly very slow). Similar to [java.util.zip.Deflater.setLevel|https://docs.oracle.com/javase/7/docs/api/java/util/zip/Deflater.html#setLevel(int)]. I would expect it takes up to 1 second for accuracyLevel 0, up to 5 seconds for accuracyLevel 1, and possibly hours for level 9. With level 1, I would read files in 00/00, with level 2 - 8 I would read files in 00, and with level 9 I would read all the files. For level 1, I wouldn't stop; for level 2, if it takes more than 5 seconds, I would stop and return the current best estimate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)