You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@distributedlog.apache.org by Jay Juma <ja...@gmail.com> on 2016/10/14 09:16:12 UTC

A global cluster for both global streams and local streams

based on my understanding, a global cluster can be setup to spread over
multiple datacenters and the data placement policy will place the data over
multiple datacenters.

My question is "is it possible to create a log stream on a global cluster
that just write to bookies within same datacenter".

I had two use cases, one is for database replication. for example, if there
are 2 datacenters A and B. A global dl cluster is setup over A and B. The
database cluster in A will write updates into a (some) global log
stream(s), and the database in B will tail-read those streams and apply
changes. I think this is a very typical use case of DistributedLog, right?

There is another use case, it is just used for replication within one
datacenter. It doesn't need to replicate to the other datacenter. We want
to share the DL cluster for these two use cases. Is there a way to achieve
that? by tuning the data placement policy for individual streams?

Let me know if you need more information. Appreciate your help.

Thanks,
Jay

Re: A global cluster for both global streams and local streams

Posted by Jay Juma <ja...@gmail.com>.
Sijie,

Thank you so much for answering my question. I've created a jira for the
discussion here - https://issues.apache.org/jira/browse/DL-58

When do you think this task will be prioritized?

- Jay

On Tue, Oct 18, 2016 at 2:00 PM, Sijie Guo <si...@twitter.com.invalid>
wrote:

> On Fri, Oct 14, 2016 at 2:16 AM, Jay Juma <ja...@gmail.com> wrote:
>
> > based on my understanding, a global cluster can be setup to spread over
> > multiple datacenters and the data placement policy will place the data
> over
> > multiple datacenters.
> >
> > My question is "is it possible to create a log stream on a global cluster
> > that just write to bookies within same datacenter".
> >
>
> In theory, yes. However currently the data placement policy is configured
> per cluster.
>
> We can consider pushing down the data placement policy as part of the log
> segment metadata. So that bookie auto-recovery will be aware of what data
> placement policy will be used for re-replicating/recovering a bookkeeper
> ledger.
>
>
> >
> > I had two use cases, one is for database replication. for example, if
> there
> > are 2 datacenters A and B. A global dl cluster is setup over A and B. The
> > database cluster in A will write updates into a (some) global log
> > stream(s), and the database in B will tail-read those streams and apply
> > changes. I think this is a very typical use case of DistributedLog,
> right?
> >
> > There is another use case, it is just used for replication within one
> > datacenter. It doesn't need to replicate to the other datacenter. We want
> > to share the DL cluster for these two use cases. Is there a way to
> achieve
> > that? by tuning the data placement policy for individual streams?
> >
>
> Typically we don't mix global replicated log with local replicated log. But
> it seems that making data placement policy configurable per stream is a
> good option for your use case.
>
>
> Do you mind creating a JIRA for us to track this use case?
>
>
> >
> > Let me know if you need more information. Appreciate your help.
> >
> > Thanks,
> > Jay
> >
>

Re: A global cluster for both global streams and local streams

Posted by Sijie Guo <si...@twitter.com.INVALID>.
On Fri, Oct 14, 2016 at 2:16 AM, Jay Juma <ja...@gmail.com> wrote:

> based on my understanding, a global cluster can be setup to spread over
> multiple datacenters and the data placement policy will place the data over
> multiple datacenters.
>
> My question is "is it possible to create a log stream on a global cluster
> that just write to bookies within same datacenter".
>

In theory, yes. However currently the data placement policy is configured
per cluster.

We can consider pushing down the data placement policy as part of the log
segment metadata. So that bookie auto-recovery will be aware of what data
placement policy will be used for re-replicating/recovering a bookkeeper
ledger.


>
> I had two use cases, one is for database replication. for example, if there
> are 2 datacenters A and B. A global dl cluster is setup over A and B. The
> database cluster in A will write updates into a (some) global log
> stream(s), and the database in B will tail-read those streams and apply
> changes. I think this is a very typical use case of DistributedLog, right?
>
> There is another use case, it is just used for replication within one
> datacenter. It doesn't need to replicate to the other datacenter. We want
> to share the DL cluster for these two use cases. Is there a way to achieve
> that? by tuning the data placement policy for individual streams?
>

Typically we don't mix global replicated log with local replicated log. But
it seems that making data placement policy configurable per stream is a
good option for your use case.


Do you mind creating a JIRA for us to track this use case?


>
> Let me know if you need more information. Appreciate your help.
>
> Thanks,
> Jay
>