You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by "Benoit Tellier (Jira)" <se...@james.apache.org> on 2021/09/08 13:30:00 UTC

[jira] [Commented] (JAMES-3150) Implement Garbage Colletion for blobs

    [ https://issues.apache.org/jira/browse/JAMES-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17411930#comment-17411930 ] 

Benoit Tellier commented on JAMES-3150:
---------------------------------------

Run 1: many deletions, listing batch size of 1.000 blobs at a time

25 hours for 35M deletions, 67M blobs.

{code:java}
{
  "additionalInformation": {
    "type": "BlobGCTask",
    "timestamp": "2021-09-08T08:14:34.883206Z",
    "referenceSourceCount": 49035372,
    "blobCount": 67546056,
    "gcedBlobCount": 34933325,
    "errorCount": 0,
    "bloomFilterExpectedBlobCount": 100000000,
    "bloomFilterAssociatedProbability": 0.02
  },
  "status": "completed",
  "taskId": "4b981c40-0d1f-4c9a-9bf9-d0aae5779647",
  "startedDate": "2021-09-07T07:50:05.648+0000",
  "completedDate": "2021-09-08T08:14:35.038+0000",
  "executedOn": "james-jmap-988b8f869-cwxwn",
  "submittedFrom": "james-jmap-988b8f869-cwxwn",
  "cancelledFrom": null,
  "submitDate": "2021-09-07T07:50:05.576+0000",
  "type": "BlobGCTask"
}
{code}

Run 2: Few deletes, listing batch size of 10.000 blobs at a time

2 hours for 32 million blobs, 3251 deletions

{code:java}
{
  "additionalInformation": {
    "type": "BlobGCTask",
    "timestamp": "2021-09-08T12:46:59.008272Z",
    "referenceSourceCount": 49035372,
    "blobCount": 32612766,
    "gcedBlobCount": 3251,
    "errorCount": 0,
    "bloomFilterExpectedBlobCount": 67546056,
    "bloomFilterAssociatedProbability": 0.02
  },
  "status": "completed",
  "type": "BlobGCTask",
  "taskId": "01dd426c-7c03-467e-a25f-5426b618773b",
  "startedDate": "2021-09-08T10:49:58.916+0000",
  "completedDate": "2021-09-08T12:46:59.055+0000",
  "executedOn": "james-jmap-84bb8c66c5-qsdpf",
  "submittedFrom": "james-imap-smtp-c4fdffbdd-vwffh",
  "cancelledFrom": null,
  "submitDate": "2021-09-08T10:49:58.762+0000"
}
{code}

We will run a *third run* tomorrow with 1.000 blob listing batch size expecting no deletes, it will allow to discriminate which factor caused the run to be slow, the small page size or the deletions.

We could also plan a *fourth run*, exploring if further increasing the  blob listing batch size further improves performance.


> Implement Garbage Colletion for blobs
> -------------------------------------
>
>                 Key: JAMES-3150
>                 URL: https://issues.apache.org/jira/browse/JAMES-3150
>             Project: James Server
>          Issue Type: Improvement
>          Components: Blob
>    Affects Versions: 3.3.0
>            Reporter: Gautier DI FOLCO
>            Priority: Major
>          Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> With the blob store deduplication, dropping a blob in a distributed environment is impossible if we want to keep an acceptable concurrency level.
> A Garbage Collector should be created in order to drop old blobs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org