You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Norberto Leite (JIRA)" <ji...@apache.org> on 2014/11/21 17:16:33 UTC
[jira] [Updated] (OAK-2284) Better locality for blobs collections
over sharding
[ https://issues.apache.org/jira/browse/OAK-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Norberto Leite updated OAK-2284:
--------------------------------
Description:
Currently when use oak with mongo mk for blobs storage we can easily end up with all different chunks of a binary streams scattered across the shards.
Now this is not ideal since it generates a large number of scattered gather queries over the shards for each individual files.
To allow better locality I propose the addition of another field called {{_anchor}}
This anchor field will be generated by the inverse order of the current timestamp of beginning of the storage of the file:
{code}
//Milliseconds Second Minute HH
SimpleDateFormat sdf = new SimpleDateFormat("SSSssmmHH");
//store the parsed integer of this value for more storage efficiency
String a = asdf.format(new Date());
int _anchor = Integer.parseInt(asdf.format(new Date()));
{code}
This new {{_anchor}} field should be part of the shard key which also requires to be indexed along side with {{_id}}
Pull request is on the making!
N.
was:
Currently when use oak with mongo mk for blobs storage we can easily end up with all different chunks of a binary streams scattered across the shards.
Now this is not ideal since it generates a large number of scattered gather queries over the shards for each individual files.
To allow better locality I propose the addition of another field called {{_anchor}}
This anchor field will be generated by the inverse order of the current timestamp of beginning of the storage of the file:
{code}
//Milliseconds Second Minute HH
SimpleDateFormat sdf = new SimpleDateFormat("SSSssmmHH");
//store the parsed integer of this value for more storage efficiency
String a = asdf.format(new Date());
int _anchor = Integer.parseInt(asdf.format(new Date()));
{code}
Pull request is on the making!
N.
> Better locality for blobs collections over sharding
> ---------------------------------------------------
>
> Key: OAK-2284
> URL: https://issues.apache.org/jira/browse/OAK-2284
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: blob, mongomk
> Affects Versions: 1.1.2
> Reporter: Norberto Leite
> Labels: performance, sharding
> Original Estimate: 96h
> Remaining Estimate: 96h
>
> Currently when use oak with mongo mk for blobs storage we can easily end up with all different chunks of a binary streams scattered across the shards.
> Now this is not ideal since it generates a large number of scattered gather queries over the shards for each individual files.
> To allow better locality I propose the addition of another field called {{_anchor}}
> This anchor field will be generated by the inverse order of the current timestamp of beginning of the storage of the file:
> {code}
> //Milliseconds Second Minute HH
> SimpleDateFormat sdf = new SimpleDateFormat("SSSssmmHH");
> //store the parsed integer of this value for more storage efficiency
> String a = asdf.format(new Date());
> int _anchor = Integer.parseInt(asdf.format(new Date()));
> {code}
> This new {{_anchor}} field should be part of the shard key which also requires to be indexed along side with {{_id}}
> Pull request is on the making!
> N.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)