You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Chetan Mehrotra <ch...@gmail.com> on 2013/07/15 11:13:49 UTC

Very large blobid with MongoMK

Hi,

I was trying to get an estimate of the size [1] of various nodes
document in MongoDB for a fresh CQ installation. The largest node was
for path ‘4:/oak:index/lucene/:data/_5_Lucene41_0.tim ‘ weighing
upward of 6 MB. It has one binary property jcr:data

"jcr:data" : {
		"r13fd2c82e10-0-1" : "[\":blobId:00fc1f3fd76c1715424c4f4.....00031aea1\"]
}

The value stored above is very large. Before digging in further wanted
to check if this is expected or the blobid should be something
smaller?

Regards
Chetan

[1] var max = 0;
var maxObj
db.nodes.find().forEach(function(obj) {
    var curr = Object.bsonsize(obj);
    if(max < curr) {
        max = curr;
        maxObj = obj;
    }
})
print(max);
printjson(maxObj);

Chetan Mehrotra

Re: Very large blobid with MongoMK

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Mon, Jul 15, 2013 at 2:19 PM, Chetan Mehrotra
<ch...@gmail.com> wrote:
> Should we tweak either of limits so that these blobs do not get
> inlined and instead saved separately otherwise I think this array
> would soon outgrow the 16 MB limit?

I adjusted the Lucene block size to 10kB in revision 1503177.

BR,

Jukka Zitting

Re: Very large blobid with MongoMK

Posted by Chetan Mehrotra <ch...@gmail.com>.
> This is not just one blob id, but a list of blob ids. It's 796 blob ids,
> each blob id is 8 KB. Total size is about 6 MB.

Okie its a multi valued property and in OakDirectory  for Lucene the
block size is set to 4092 while the minBlockSize in AbstractBlobStore
is 4096. So effectively all the lucene index blocks are getting
inlined.

Should we tweak either of limits so that these blobs do not get
inlined and instead saved separately otherwise I think this array
would soon outgrow the 16 MB limit?

Chetan Mehrotra


On Mon, Jul 15, 2013 at 4:02 PM, Thomas Mueller <mu...@adobe.com> wrote:
> Hi,
>
> This is not just one blob id, but a list of blob ids. It's 796 blob ids,
> each blob id is 8 KB. Total size is about 6 MB.
>
> Regards,
> Thomas
>
> On 7/15/13 11:46 AM, "Chetan Mehrotra" <ch...@gmail.com> wrote:
>
>>> The blob id shouldn't be larger than a few KB at most (it can be
>>> configured).
>>
>>Yup thats what I was thinking after looking into the
>>org.apache.jackrabbit.mk.blobs.AbstractBlobStore#convertBlobToId.
>>Would have a closer look.
>>
>>Chetan Mehrotra
>>
>>
>>On Mon, Jul 15, 2013 at 2:53 PM, Thomas Mueller <mu...@adobe.com> wrote:
>>> Hi,
>>>
>>> The blob id shouldn't be larger than a few KB at most (it can be
>>> configured).
>>>
>>> Regards,
>>> Thomas
>>>
>>>
>>>
>>> On 7/15/13 11:13 AM, "Chetan Mehrotra" <ch...@gmail.com>
>>>wrote:
>>>
>>>>Hi,
>>>>
>>>>I was trying to get an estimate of the size [1] of various nodes
>>>>document in MongoDB for a fresh CQ installation. The largest node was
>>>>for path Œ4:/oak:index/lucene/:data/_5_Lucene41_0.tim Œ weighing
>>>>upward of 6 MB. It has one binary property jcr:data
>>>>
>>>>"jcr:data" : {
>>>>               "r13fd2c82e10-0-1" :
>>>>"[\":blobId:00fc1f3fd76c1715424c4f4.....00031aea1\"]
>>>>}
>>>>
>>>>The value stored above is very large. Before digging in further wanted
>>>>to check if this is expected or the blobid should be something
>>>>smaller?
>>>>
>>>>Regards
>>>>Chetan
>>>>
>>>>[1] var max = 0;
>>>>var maxObj
>>>>db.nodes.find().forEach(function(obj) {
>>>>    var curr = Object.bsonsize(obj);
>>>>    if(max < curr) {
>>>>        max = curr;
>>>>        maxObj = obj;
>>>>    }
>>>>})
>>>>print(max);
>>>>printjson(maxObj);
>>>>
>>>>Chetan Mehrotra
>>>
>

Re: Very large blobid with MongoMK

Posted by Thomas Mueller <mu...@adobe.com>.
Hi,

This is not just one blob id, but a list of blob ids. It's 796 blob ids,
each blob id is 8 KB. Total size is about 6 MB.

Regards,
Thomas

On 7/15/13 11:46 AM, "Chetan Mehrotra" <ch...@gmail.com> wrote:

>> The blob id shouldn't be larger than a few KB at most (it can be
>> configured).
>
>Yup thats what I was thinking after looking into the
>org.apache.jackrabbit.mk.blobs.AbstractBlobStore#convertBlobToId.
>Would have a closer look.
>
>Chetan Mehrotra
>
>
>On Mon, Jul 15, 2013 at 2:53 PM, Thomas Mueller <mu...@adobe.com> wrote:
>> Hi,
>>
>> The blob id shouldn't be larger than a few KB at most (it can be
>> configured).
>>
>> Regards,
>> Thomas
>>
>>
>>
>> On 7/15/13 11:13 AM, "Chetan Mehrotra" <ch...@gmail.com>
>>wrote:
>>
>>>Hi,
>>>
>>>I was trying to get an estimate of the size [1] of various nodes
>>>document in MongoDB for a fresh CQ installation. The largest node was
>>>for path Œ4:/oak:index/lucene/:data/_5_Lucene41_0.tim Œ weighing
>>>upward of 6 MB. It has one binary property jcr:data
>>>
>>>"jcr:data" : {
>>>               "r13fd2c82e10-0-1" :
>>>"[\":blobId:00fc1f3fd76c1715424c4f4.....00031aea1\"]
>>>}
>>>
>>>The value stored above is very large. Before digging in further wanted
>>>to check if this is expected or the blobid should be something
>>>smaller?
>>>
>>>Regards
>>>Chetan
>>>
>>>[1] var max = 0;
>>>var maxObj
>>>db.nodes.find().forEach(function(obj) {
>>>    var curr = Object.bsonsize(obj);
>>>    if(max < curr) {
>>>        max = curr;
>>>        maxObj = obj;
>>>    }
>>>})
>>>print(max);
>>>printjson(maxObj);
>>>
>>>Chetan Mehrotra
>>


Re: Very large blobid with MongoMK

Posted by Chetan Mehrotra <ch...@gmail.com>.
> The blob id shouldn't be larger than a few KB at most (it can be
> configured).

Yup thats what I was thinking after looking into the
org.apache.jackrabbit.mk.blobs.AbstractBlobStore#convertBlobToId.
Would have a closer look.

Chetan Mehrotra


On Mon, Jul 15, 2013 at 2:53 PM, Thomas Mueller <mu...@adobe.com> wrote:
> Hi,
>
> The blob id shouldn't be larger than a few KB at most (it can be
> configured).
>
> Regards,
> Thomas
>
>
>
> On 7/15/13 11:13 AM, "Chetan Mehrotra" <ch...@gmail.com> wrote:
>
>>Hi,
>>
>>I was trying to get an estimate of the size [1] of various nodes
>>document in MongoDB for a fresh CQ installation. The largest node was
>>for path Œ4:/oak:index/lucene/:data/_5_Lucene41_0.tim Œ weighing
>>upward of 6 MB. It has one binary property jcr:data
>>
>>"jcr:data" : {
>>               "r13fd2c82e10-0-1" :
>>"[\":blobId:00fc1f3fd76c1715424c4f4.....00031aea1\"]
>>}
>>
>>The value stored above is very large. Before digging in further wanted
>>to check if this is expected or the blobid should be something
>>smaller?
>>
>>Regards
>>Chetan
>>
>>[1] var max = 0;
>>var maxObj
>>db.nodes.find().forEach(function(obj) {
>>    var curr = Object.bsonsize(obj);
>>    if(max < curr) {
>>        max = curr;
>>        maxObj = obj;
>>    }
>>})
>>print(max);
>>printjson(maxObj);
>>
>>Chetan Mehrotra
>

Re: Very large blobid with MongoMK

Posted by Thomas Mueller <mu...@adobe.com>.
Hi,

The blob id shouldn't be larger than a few KB at most (it can be
configured).

Regards,
Thomas



On 7/15/13 11:13 AM, "Chetan Mehrotra" <ch...@gmail.com> wrote:

>Hi,
>
>I was trying to get an estimate of the size [1] of various nodes
>document in MongoDB for a fresh CQ installation. The largest node was
>for path Œ4:/oak:index/lucene/:data/_5_Lucene41_0.tim Œ weighing
>upward of 6 MB. It has one binary property jcr:data
>
>"jcr:data" : {
>		"r13fd2c82e10-0-1" :
>"[\":blobId:00fc1f3fd76c1715424c4f4.....00031aea1\"]
>}
>
>The value stored above is very large. Before digging in further wanted
>to check if this is expected or the blobid should be something
>smaller?
>
>Regards
>Chetan
>
>[1] var max = 0;
>var maxObj
>db.nodes.find().forEach(function(obj) {
>    var curr = Object.bsonsize(obj);
>    if(max < curr) {
>        max = curr;
>        maxObj = obj;
>    }
>})
>print(max);
>printjson(maxObj);
>
>Chetan Mehrotra


Re: Very large blobid with MongoMK

Posted by Thomas Mueller <mu...@adobe.com>.
Hi,

Maybe the weighting algorithm isn't correct here. Do you know the size of
the blob id?

Regards,
Thomas


On 7/15/13 11:13 AM, "Chetan Mehrotra" <ch...@gmail.com> wrote:

>Hi,
>
>I was trying to get an estimate of the size [1] of various nodes
>document in MongoDB for a fresh CQ installation. The largest node was
>for path Œ4:/oak:index/lucene/:data/_5_Lucene41_0.tim Œ weighing
>upward of 6 MB. It has one binary property jcr:data
>
>"jcr:data" : {
>		"r13fd2c82e10-0-1" :
>"[\":blobId:00fc1f3fd76c1715424c4f4.....00031aea1\"]
>}
>
>The value stored above is very large. Before digging in further wanted
>to check if this is expected or the blobid should be something
>smaller?
>
>Regards
>Chetan
>
>[1] var max = 0;
>var maxObj
>db.nodes.find().forEach(function(obj) {
>    var curr = Object.bsonsize(obj);
>    if(max < curr) {
>        max = curr;
>        maxObj = obj;
>    }
>})
>print(max);
>printjson(maxObj);
>
>Chetan Mehrotra