You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Vikas Saurabh (JIRA)" <ji...@apache.org> on 2017/12/18 10:24:00 UTC

[jira] [Comment Edited] (OAK-7066) Active deletion blob list files can grow too large due to inlined blobs

    [ https://issues.apache.org/jira/browse/OAK-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16294750#comment-16294750 ] 

Vikas Saurabh edited comment on OAK-7066 at 12/18/17 10:23 AM:
---------------------------------------------------------------

Fixed in trunk at [r1818545|https://svn.apache.org/r1818545].

FTR: I've used {{InMemoryDataRecord#isInstance}} - thus inlined binaries for data store won't make into recorded blob ids for active deletion. Otoh, blob store ones would always make it there.


was (Author: catholicon):
Fixed in trunk at [r1818545|https://svn.apache.org/r1818545].

> Active deletion blob list files can grow too large due to inlined blobs
> -----------------------------------------------------------------------
>
>                 Key: OAK-7066
>                 URL: https://issues.apache.org/jira/browse/OAK-7066
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Vikas Saurabh
>            Assignee: Vikas Saurabh
>            Priority: Blocker
>             Fix For: 1.7.13, 1.8.
>
>
> This is follow up from OAK-7052 where we noticed that deleted blob list files collected by active deletion logic can grow very large due to inlined blobs.
> One potential way (not sure how yet though) is to not actively delete inlined blobs.
> Here are some stats which might help us take a call (based on raw numbers collected at \[0])
> ||file-name||large_lines||large_size||small_lines||small_size||small_lines/total_lines||small_size/total_size||
> |blobs-1512664032264.txt|245301|3310224358|173096|35473656|0.413712335413495|0.010602766852107|
> |blobs-1512698405656.txt|370373|4443957885|256775|52997864|0.409432861142824|0.011785275852845|
> |blobs-1512987450004.txt|660669|6214740439|461168|92017554|0.411082893504137|0.014590309966251|
> |blobs-1513130410963.txt|569083|5490965583|406756|80124598|0.416826956085994|0.014382211631264|
> |blobs-1513216819447.txt|69876|1413561892|46238|9221956|0.398212101899857|0.006481628262061|
> \[0]:
> file sizes
> {noformat}
> repository/index/deleted-blobs$ ls -l blobs-151*
> -rw-r--r-- 1 root root 3369065620 Dec  8 01:59 blobs-1512664032264.txt
> -rw-r--r-- 1 root root 4532250073 Dec  9 01:59 blobs-1512698405656.txt
> -rw-r--r-- 1 root root 6370201955 Dec 13 01:59 blobs-1512987450004.txt
> -rw-r--r-- 1 root root 1916223582 Dec 13 11:52 blobs-1513130410963.txt
> {noformat}
> number of entries
> {noformat}
> repository/index/deleted-blobs$ wc -l blobs-151*
>      418397 blobs-1512664032264.txt
>      627148 blobs-1512698405656.txt
>     1121837 blobs-1512987450004.txt
>      308292 blobs-1513130410963.txt
>     2475674 total
> {noformat}
> number of entries and sizes split on threshold of 500 bytes of blob ids
> {noformat}
> repository/index/deleted-blobs$ for i in blobs-151*;do echo $i;awk 'BEGIN {FS="|"} {len = length($1); if (len > 500) {large++; largeSize+=len} else {small++; smallSize+=len}} END {print large, largeSize, small, smallSize}' $i;done
> blobs-1512664032264.txt
> 245301 3310224358 173096 35473656
> blobs-1512698405656.txt
> 370373 4443957885 256775 52997864
> blobs-1512987450004.txt
> 660669 6214740439 461168 92017554
> blobs-1513130410963.txt
> 569083 5490965583 406756 80124598
> blobs-1513216819447.txt
> 69876 1413561892 46238 9221956
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)