You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Norberto Leite (JIRA)" <ji...@apache.org> on 2014/11/21 17:16:33 UTC

[jira] [Updated] (OAK-2284) Better locality for blobs collections over sharding

     [ https://issues.apache.org/jira/browse/OAK-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Norberto Leite updated OAK-2284:
--------------------------------
    Description: 
Currently when use oak with mongo mk for blobs storage we can easily end up with all different chunks of a binary streams scattered across the shards. 

Now this is not ideal since it generates a large number of scattered gather queries over the shards for each individual files. 

To allow better locality I propose the addition of another field called {{_anchor}}
This anchor field will be generated by the inverse order of the current timestamp of beginning of the storage of the file:
{code}
//Milliseconds Second Minute HH
SimpleDateFormat sdf = new SimpleDateFormat("SSSssmmHH");
//store the parsed integer of this value for more storage efficiency
String a = asdf.format(new Date());
int _anchor = Integer.parseInt(asdf.format(new Date()));
{code}

This new {{_anchor}} field should be part of the shard key which also requires to be indexed along side with {{_id}}


Pull request is on the making!

N.


  was:
Currently when use oak with mongo mk for blobs storage we can easily end up with all different chunks of a binary streams scattered across the shards. 

Now this is not ideal since it generates a large number of scattered gather queries over the shards for each individual files. 

To allow better locality I propose the addition of another field called {{_anchor}}
This anchor field will be generated by the inverse order of the current timestamp of beginning of the storage of the file:
{code}
//Milliseconds Second Minute HH
SimpleDateFormat sdf = new SimpleDateFormat("SSSssmmHH");
//store the parsed integer of this value for more storage efficiency
String a = asdf.format(new Date());
int _anchor = Integer.parseInt(asdf.format(new Date()));
{code}

Pull request is on the making!

N.



> Better locality for blobs collections over sharding
> ---------------------------------------------------
>
>                 Key: OAK-2284
>                 URL: https://issues.apache.org/jira/browse/OAK-2284
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: blob, mongomk
>    Affects Versions: 1.1.2
>            Reporter: Norberto Leite
>              Labels: performance, sharding
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Currently when use oak with mongo mk for blobs storage we can easily end up with all different chunks of a binary streams scattered across the shards. 
> Now this is not ideal since it generates a large number of scattered gather queries over the shards for each individual files. 
> To allow better locality I propose the addition of another field called {{_anchor}}
> This anchor field will be generated by the inverse order of the current timestamp of beginning of the storage of the file:
> {code}
> //Milliseconds Second Minute HH
> SimpleDateFormat sdf = new SimpleDateFormat("SSSssmmHH");
> //store the parsed integer of this value for more storage efficiency
> String a = asdf.format(new Date());
> int _anchor = Integer.parseInt(asdf.format(new Date()));
> {code}
> This new {{_anchor}} field should be part of the shard key which also requires to be indexed along side with {{_id}}
> Pull request is on the making!
> N.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)