You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Timothée Maret <ti...@gmail.com> on 2015/06/29 12:35:28 UTC

S3DataStore leverage Cross-Region Replication

Hi,

In a cross region setup using the S3 data store, it may make sense to
leverage the Cross-Region auto replication of S3 buckets [0,1].

In order to avoid data replication issues it would make sense IMO to allow
configuring the S3DataStore with two S3 buckets, one for writing and one
for reading.
The writing bucket would be shared among all instance (from all regions)
while the reading bucket would be in each region (thus decreasing the
latency).
The writing bucket would auto replicate to the reading buckets.

Has this been tested already ? Generally, wdyt ?

Regards,

Timothee



[0]
https://aws.amazon.com/blogs/aws/new-cross-region-replication-for-amazon-s3/
[1] https://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html

RE: S3DataStore leverage Cross-Region Replication

Posted by Shashank Gupta <sh...@adobe.com>.
Yes Michael. It would be in OAK's BlobStore

Thanks,
-shashank

-----Original Message-----
From: Michael Marth [mailto:mmarth@adobe.com] 
Sent: Tuesday, June 30, 2015 4:50 PM
To: oak-dev@jackrabbit.apache.org
Subject: Re: S3DataStore leverage Cross-Region Replication

Shashank,

In case we think it’s needed to implement multiple chained S3 DSs then I think we should model it after Jackrabbit’s Multidatastore which allows arbitrary DS implementations to be chained:
http://jackrabbit.510166.n4.nabble.com/MultiDataStore-td4655772.html

Michael




On 30/06/15 12:11, "Shashank Gupta" <sh...@adobe.com> wrote:

>Hi Tim,
>There is no time bound SLA provided by AWS when a given binary would be successfully replicated to destination S3 bucket.  There would be cases of missing binaries if mongo nodes sync faster than S3 replication.  Also S3 replication works between a given pair of buckets. So one S3 bucket can replicate to a single S3 destination bucket. 
>
>I think we can implement a tiered S3Datastore which writes/reads to/from multiple S3 buckets. The tiered S3DS first tries to read from same-region bucket and if not found than fallback to cross-geo buckets. 
>
>> Has this been tested already ? Generally, wdyt ?
>No. I suggest to first test cross geo mongo deployment with single S3 bucket. There shouldn't be functional issue in using single S3 bucket. Few customers use single shared S3 bucket between non-clustered cross-geo jackrabbit2 repositories in production. 
>
>Thanks,
>-shashank
>
>
>
>
>-----Original Message-----
>From: maret.timothee@gmail.com [mailto:maret.timothee@gmail.com] On 
>Behalf Of Timothée Maret
>Sent: Monday, June 29, 2015 4:05 PM
>To: oak-dev@jackrabbit.apache.org
>Subject: S3DataStore leverage Cross-Region Replication
>
>Hi,
>
>In a cross region setup using the S3 data store, it may make sense to leverage the Cross-Region auto replication of S3 buckets [0,1].
>
>In order to avoid data replication issues it would make sense IMO to allow configuring the S3DataStore with two S3 buckets, one for writing and one for reading.
>The writing bucket would be shared among all instance (from all regions) while the reading bucket would be in each region (thus decreasing the latency).
>The writing bucket would auto replicate to the reading buckets.
>
>Has this been tested already ? Generally, wdyt ?
>
>Regards,
>
>Timothee
>
>
>
>[0]
>https://aws.amazon.com/blogs/aws/new-cross-region-replication-for-amazo
>n-s3/ [1] https://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html

Re: S3DataStore leverage Cross-Region Replication

Posted by Michael Marth <mm...@adobe.com>.
Shashank,

In case we think it’s needed to implement multiple chained S3 DSs then I think we should model it after Jackrabbit’s Multidatastore which allows arbitrary DS implementations to be chained:
http://jackrabbit.510166.n4.nabble.com/MultiDataStore-td4655772.html

Michael




On 30/06/15 12:11, "Shashank Gupta" <sh...@adobe.com> wrote:

>Hi Tim,
>There is no time bound SLA provided by AWS when a given binary would be successfully replicated to destination S3 bucket.  There would be cases of missing binaries if mongo nodes sync faster than S3 replication.  Also S3 replication works between a given pair of buckets. So one S3 bucket can replicate to a single S3 destination bucket. 
>
>I think we can implement a tiered S3Datastore which writes/reads to/from multiple S3 buckets. The tiered S3DS first tries to read from same-region bucket and if not found than fallback to cross-geo buckets. 
>
>> Has this been tested already ? Generally, wdyt ?
>No. I suggest to first test cross geo mongo deployment with single S3 bucket. There shouldn't be functional issue in using single S3 bucket. Few customers use single shared S3 bucket between non-clustered cross-geo jackrabbit2 repositories in production. 
>
>Thanks,
>-shashank
>
>
>
>
>-----Original Message-----
>From: maret.timothee@gmail.com [mailto:maret.timothee@gmail.com] On Behalf Of Timothée Maret
>Sent: Monday, June 29, 2015 4:05 PM
>To: oak-dev@jackrabbit.apache.org
>Subject: S3DataStore leverage Cross-Region Replication
>
>Hi,
>
>In a cross region setup using the S3 data store, it may make sense to leverage the Cross-Region auto replication of S3 buckets [0,1].
>
>In order to avoid data replication issues it would make sense IMO to allow configuring the S3DataStore with two S3 buckets, one for writing and one for reading.
>The writing bucket would be shared among all instance (from all regions) while the reading bucket would be in each region (thus decreasing the latency).
>The writing bucket would auto replicate to the reading buckets.
>
>Has this been tested already ? Generally, wdyt ?
>
>Regards,
>
>Timothee
>
>
>
>[0]
>https://aws.amazon.com/blogs/aws/new-cross-region-replication-for-amazon-s3/
>[1] https://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html

Re: S3DataStore leverage Cross-Region Replication

Posted by Bruce Edge <br...@texture.com>.
Maybe not a bottleneck, but for large data sets would it reduce the data transfer costs as the inter-s3 bucket transfer would use AWS’ internal links rather than route over external interfaces.

-Bruce



> Has this been tested already ? Generally, wdyt ?
No. I suggest to first test cross geo mongo deployment with single S3
bucket. There shouldn't be functional issue in using single S3 bucket. Few
customers use single shared S3 bucket between non-clustered cross-geo
jackrabbit2 repositories in production.


Sure, adding more complexity only make sense if we can demonstrate this is
a bottleneck.

Regards,

Timothee



Re: S3DataStore leverage Cross-Region Replication

Posted by Timothée Maret <ti...@gmail.com>.
Hi Shashank,

Thanks for this.

2015-06-30 12:11 GMT+02:00 Shashank Gupta <sh...@adobe.com>:

> Hi Tim,
> There is no time bound SLA provided by AWS when a given binary would be
> successfully replicated to destination S3 bucket.

There would be cases of missing binaries if mongo nodes sync faster than S3
> replication.


Yes, this would be expected, until the buckets replicate.


>   Also S3 replication works between a given pair of buckets. So one S3
> bucket can replicate to a single S3 destination bucket.
>

Yes, the setup would be limited to two regions.


>
> I think we can implement a tiered S3Datastore which writes/reads to/from
> multiple S3 buckets. The tiered S3DS first tries to read from same-region
> bucket and if not found than fallback to cross-geo buckets.
>

Great, although I see it would be valuable in a limited set of use cases
(only two regions involved).


>
> > Has this been tested already ? Generally, wdyt ?
> No. I suggest to first test cross geo mongo deployment with single S3
> bucket. There shouldn't be functional issue in using single S3 bucket. Few
> customers use single shared S3 bucket between non-clustered cross-geo
> jackrabbit2 repositories in production.
>

Sure, adding more complexity only make sense if we can demonstrate this is
a bottleneck.

Regards,

Timothee


>
> Thanks,
> -shashank
>
>
>
>
> -----Original Message-----
> From: maret.timothee@gmail.com [mailto:maret.timothee@gmail.com] On
> Behalf Of Timothée Maret
> Sent: Monday, June 29, 2015 4:05 PM
> To: oak-dev@jackrabbit.apache.org
> Subject: S3DataStore leverage Cross-Region Replication
>
> Hi,
>
> In a cross region setup using the S3 data store, it may make sense to
> leverage the Cross-Region auto replication of S3 buckets [0,1].
>
> In order to avoid data replication issues it would make sense IMO to allow
> configuring the S3DataStore with two S3 buckets, one for writing and one
> for reading.
> The writing bucket would be shared among all instance (from all regions)
> while the reading bucket would be in each region (thus decreasing the
> latency).
> The writing bucket would auto replicate to the reading buckets.
>
> Has this been tested already ? Generally, wdyt ?
>
> Regards,
>
> Timothee
>
>
>
> [0]
>
> https://aws.amazon.com/blogs/aws/new-cross-region-replication-for-amazon-s3/
> [1] https://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html
>

RE: S3DataStore leverage Cross-Region Replication

Posted by Shashank Gupta <sh...@adobe.com>.
Hi Tim,
There is no time bound SLA provided by AWS when a given binary would be successfully replicated to destination S3 bucket.  There would be cases of missing binaries if mongo nodes sync faster than S3 replication.  Also S3 replication works between a given pair of buckets. So one S3 bucket can replicate to a single S3 destination bucket. 

I think we can implement a tiered S3Datastore which writes/reads to/from multiple S3 buckets. The tiered S3DS first tries to read from same-region bucket and if not found than fallback to cross-geo buckets. 

> Has this been tested already ? Generally, wdyt ?
No. I suggest to first test cross geo mongo deployment with single S3 bucket. There shouldn't be functional issue in using single S3 bucket. Few customers use single shared S3 bucket between non-clustered cross-geo jackrabbit2 repositories in production. 

Thanks,
-shashank




-----Original Message-----
From: maret.timothee@gmail.com [mailto:maret.timothee@gmail.com] On Behalf Of Timothée Maret
Sent: Monday, June 29, 2015 4:05 PM
To: oak-dev@jackrabbit.apache.org
Subject: S3DataStore leverage Cross-Region Replication

Hi,

In a cross region setup using the S3 data store, it may make sense to leverage the Cross-Region auto replication of S3 buckets [0,1].

In order to avoid data replication issues it would make sense IMO to allow configuring the S3DataStore with two S3 buckets, one for writing and one for reading.
The writing bucket would be shared among all instance (from all regions) while the reading bucket would be in each region (thus decreasing the latency).
The writing bucket would auto replicate to the reading buckets.

Has this been tested already ? Generally, wdyt ?

Regards,

Timothee



[0]
https://aws.amazon.com/blogs/aws/new-cross-region-replication-for-amazon-s3/
[1] https://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html