You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Yudong Gao <st...@umich.edu> on 2011/04/06 00:05:28 UTC

Location-aware replication based on objects' access pattern

Hi,

I am thinking about using Cassandra for our research project, and we
are thinking about one interesting feature.

Our setup has multiple datacenters located in different geography
locations. Data is accessed with predictable patterns. Think of
something like Craigslist, data objects corresponding to CA will
mostly accessed by users from the west cost. If this case, if all the
replicas are stored in the east coast, the access would not be
efficient. Other applications such as Facebook, should also have
similar concern.

I am aware of the placement strategies such as
RackAwareStrategy/NetworkTopologyStrategy. But they place objects
based on their hashed token, but not they access pattern. I am
thinking about one possible trick, which is to manipulate the key of
the object based on its access pattern, so that the key can be mapped
to a token that will have at least one replica (ideally the primary
replica) stored in the desired data center, and the other replicas
stored in other data centers for reliability concern.

I found this post discussing a similar problem,

http://www.mail-archive.com/user@cassandra.apache.org/msg00695.html

but Ben suggested just writing one new replication strategy. IMO, this
location-aware replication should be one common problem for Cassandra,
especially since it has been widely used in many large-scale
commercial applications such as Facebook and Twitter. I am interested
in how they handle this problem.

Is there any existing solution that I refer to and get start with?

Thanks!

Yudong

Re: Location-aware replication based on objects' access pattern

Posted by Sasha Dolgy <sd...@gmail.com>.
I use the global ring to provide quick reference data.  I don't see it
as a bottle neck because there isn't a lot of data floating around it.
 Just enough to satisfy specific use cases we have.  If the end user
needs to access the data in the US ring while in Asia via the web, I
would opt to transfer their data queries through application servers
accessing the US ring...bypassing the European application servers.
You referenced craigslist earlier.  That's how I would design it ..
user clicks on Zurich but their default is Toronto ... Quick check to
the global ring to get a list of application servers for Zurich and
continue on business as usual querying the specific data for
Zurich/Europe.

As for replication, we are playing with Ec2Snitch now to leverage
multiple EC2 instances all over the place ... some are dedicated to
Europe, but will exist in Americas (for an extreme use case) .. while
normally we will balance between two regional sites (A / B)

Having said all that, I'll defer to the people on this list who have
much more commercial experience with it ...

On Wed, Apr 6, 2011 at 6:40 PM, Yudong Gao <st...@umich.edu> wrote:
>
> This is interesting. But how do you design the global ring to make
> sure that it is not the bottleneck? For example, if a client need to
> access data in the US ring, but she need to first talk to a europe
> node to get the reference data, this will not be efficient.
>
> Another potential problem is that the data is not synchronized among
> the rings. If one data center goes down, the data stored there will
> get lost. One way to get around may be to use the
> NetworkTopologyStrategy. For example, with RF=3, for the ring in
> europe, we can specify 2 replicas in europe and 1 replica in america.
>
> Thanks!
>
> Yudong
>
>> -sd

Re: Location-aware replication based on objects' access pattern

Posted by Yudong Gao <st...@umich.edu>.
On Wed, Apr 6, 2011 at 3:55 AM, Sasha Dolgy <sd...@gmail.com> wrote:
> I had been asked this question from a strategy point of view, and
> referenced how linkedin.com appears to handle this.
>
> <assumption>
> Specific region data is stored on a ring in that region.  While based
> in the middle east, my linkedin.com profile was kept on the middle
> east part of linkedin.com ... when I moved back to europe, updated my
> city, my profile shifted from the middle east to europe ...
> </assumption>
>
> would it not be easier to manage multiple rings (one in each required
> geographic region) to suit the location aware use case?  This way you
> can grow out that region as necessary and invest less into the regions
> that aren't as busy ...
>
> would mean your application needs to be aware of the different regions
> and where data exists ... or make some initial assumptions as to where
> to find data ...
>
> - 1 ring for apac
> - 1 ring for europe
> - 1 ring for americas
> - 1 global ring (with nodes present in each region)
>
> the global ring maintains reference data on which ring a guid exists ...
>
> I've been playing with this concept on AWS ... the amount of data I
> have isn't significant, so I may not have run into problems that will
> occur when i get to large amounts of data ...
>

This is interesting. But how do you design the global ring to make
sure that it is not the bottleneck? For example, if a client need to
access data in the US ring, but she need to first talk to a europe
node to get the reference data, this will not be efficient.

Another potential problem is that the data is not synchronized among
the rings. If one data center goes down, the data stored there will
get lost. One way to get around may be to use the
NetworkTopologyStrategy. For example, with RF=3, for the ring in
europe, we can specify 2 replicas in europe and 1 replica in america.

Thanks!

Yudong

> -sd
>
> On Wed, Apr 6, 2011 at 9:26 AM, Jonathan Colby <jo...@gmail.com> wrote:
>> good to see a discussion on this.
>>
>> This also has practical use for business continuity where you can control that the clients in a given data center first write replicas to its own data center, then to the other data center for backup.  If I understand correctly, a write takes the token into account first, then the replication strategy decides where the replicas go.   I would like to see the the first writes to be based on "location" instead of token -   whether that is accomplished by manipulating the key or some other mechanism.
>>
>> That way, if you do suffer the loss of a data center,  the clients are guaranteed to meet quorum on the nodes in its own data center  (given  a mirrored architecture across 2 data centers).
>>
>> We have 2 data centers.  If one goes down we have the problem that quorum cannot be satisfied for half of the reads.
>

Re: Location-aware replication based on objects' access pattern

Posted by Sasha Dolgy <sd...@gmail.com>.
I had been asked this question from a strategy point of view, and
referenced how linkedin.com appears to handle this.

<assumption>
Specific region data is stored on a ring in that region.  While based
in the middle east, my linkedin.com profile was kept on the middle
east part of linkedin.com ... when I moved back to europe, updated my
city, my profile shifted from the middle east to europe ...
</assumption>

would it not be easier to manage multiple rings (one in each required
geographic region) to suit the location aware use case?  This way you
can grow out that region as necessary and invest less into the regions
that aren't as busy ...

would mean your application needs to be aware of the different regions
and where data exists ... or make some initial assumptions as to where
to find data ...

- 1 ring for apac
- 1 ring for europe
- 1 ring for americas
- 1 global ring (with nodes present in each region)

the global ring maintains reference data on which ring a guid exists ...

I've been playing with this concept on AWS ... the amount of data I
have isn't significant, so I may not have run into problems that will
occur when i get to large amounts of data ...

-sd

On Wed, Apr 6, 2011 at 9:26 AM, Jonathan Colby <jo...@gmail.com> wrote:
> good to see a discussion on this.
>
> This also has practical use for business continuity where you can control that the clients in a given data center first write replicas to its own data center, then to the other data center for backup.  If I understand correctly, a write takes the token into account first, then the replication strategy decides where the replicas go.   I would like to see the the first writes to be based on "location" instead of token -   whether that is accomplished by manipulating the key or some other mechanism.
>
> That way, if you do suffer the loss of a data center,  the clients are guaranteed to meet quorum on the nodes in its own data center  (given  a mirrored architecture across 2 data centers).
>
> We have 2 data centers.  If one goes down we have the problem that quorum cannot be satisfied for half of the reads.

Re: Location-aware replication based on objects' access pattern

Posted by Jonathan Colby <jo...@gmail.com>.
good to see a discussion on this. 

This also has practical use for business continuity where you can control that the clients in a given data center first write replicas to its own data center, then to the other data center for backup.  If I understand correctly, a write takes the token into account first, then the replication strategy decides where the replicas go.   I would like to see the the first writes to be based on "location" instead of token -   whether that is accomplished by manipulating the key or some other mechanism.

That way, if you do suffer the loss of a data center,  the clients are guaranteed to meet quorum on the nodes in its own data center  (given  a mirrored architecture across 2 data centers).

We have 2 data centers.  If one goes down we have the problem that quorum cannot be satisfied for half of the reads.


On Apr 6, 2011, at 6:00 AM, Jonathan Ellis wrote:

> On Tue, Apr 5, 2011 at 10:45 PM, Yudong Gao <st...@umich.edu> wrote:
>>> A better solution would be to just push the DecoratedKey into the
>>> ReplicationStrategy so it can make its decision before information is
>>> thrown away.
>> 
>> I agree. So in this case, I guess the hashed based token ring is still
>> preserved to avoid hot spot, but we further use the DecoratedKey to
>> guide the replication strategy. For example, replica 2 is placed in
>> the first node along the ring the belongs the desirable data center
>> (based on the location hint embedded DecoratedKey). But we may not be
>> able to control the primary replica. Do you think this will be a
>> reasonable design?
> 
> calculateNaturalEndpoints has complete freedom to generate all
> replicas any way it likes.  Thinking of an endpoint as "primary"
> because it was generated first by one algorithm is dangerous.
> 
> As one of the docstrings explains, replica destinations ("endpoints")
> should be considered a Set even though we use a List for efficiency.
> None of them are special at the ReplicationStrategy level.
> 
>> Just curious, are they happy with the current
>> solution with keyspace, and is there some requests for per-row
>> placement control?
> 
> Enough people want to try it that we have the ticket open. :)
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com


Re: Location-aware replication based on objects' access pattern

Posted by Jonathan Ellis <jb...@gmail.com>.
On Tue, Apr 5, 2011 at 10:45 PM, Yudong Gao <st...@umich.edu> wrote:
>> A better solution would be to just push the DecoratedKey into the
>> ReplicationStrategy so it can make its decision before information is
>> thrown away.
>
> I agree. So in this case, I guess the hashed based token ring is still
> preserved to avoid hot spot, but we further use the DecoratedKey to
> guide the replication strategy. For example, replica 2 is placed in
> the first node along the ring the belongs the desirable data center
> (based on the location hint embedded DecoratedKey). But we may not be
> able to control the primary replica. Do you think this will be a
> reasonable design?

calculateNaturalEndpoints has complete freedom to generate all
replicas any way it likes.  Thinking of an endpoint as "primary"
because it was generated first by one algorithm is dangerous.

As one of the docstrings explains, replica destinations ("endpoints")
should be considered a Set even though we use a List for efficiency.
None of them are special at the ReplicationStrategy level.

> Just curious, are they happy with the current
> solution with keyspace, and is there some requests for per-row
> placement control?

Enough people want to try it that we have the ticket open. :)

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Location-aware replication based on objects' access pattern

Posted by Yudong Gao <st...@umich.edu>.
On Tue, Apr 5, 2011 at 9:59 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> On Tue, Apr 5, 2011 at 8:37 PM, Yudong Gao <st...@umich.edu> wrote:
>> One thing I am worrying about is how to maintain the location
>> information for each row. The current partitioner maps a key to MD5
>> hash, and it is almost impossible to control the hashed token by
>> manipulating the value of the key. Also, maintaining a key-to-location
>> mapping would be unscalable. My initial thought is to use the key
>> string as the token directly, so that the location information can be
>> binded into the key. This minimize the changes to the other
>> components.
>
> This is what ByteOrderedPartitioner does, but that tends to create hot
> spots since sequential keys are stored on the same node.
>
> A better solution would be to just push the DecoratedKey into the
> ReplicationStrategy so it can make its decision before information is
> thrown away.

I agree. So in this case, I guess the hashed based token ring is still
preserved to avoid hot spot, but we further use the DecoratedKey to
guide the replication strategy. For example, replica 2 is placed in
the first node along the ring the belongs the desirable data center
(based on the location hint embedded DecoratedKey). But we may not be
able to control the primary replica. Do you think this will be a
reasonable design?

>
>> Do you know how the existing application is achieving this without the
>> per-row support?
>
> All existing applications places replicas by keyspace, not by row.
>

I see. So I guess each keyspace is mapped to one data center in the
desired location? Just curious, are they happy with the current
solution with keyspace, and is there some requests for per-row
placement control?

Thanks!

Yudong

> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: Location-aware replication based on objects' access pattern

Posted by Jonathan Ellis <jb...@gmail.com>.
On Tue, Apr 5, 2011 at 8:37 PM, Yudong Gao <st...@umich.edu> wrote:
> One thing I am worrying about is how to maintain the location
> information for each row. The current partitioner maps a key to MD5
> hash, and it is almost impossible to control the hashed token by
> manipulating the value of the key. Also, maintaining a key-to-location
> mapping would be unscalable. My initial thought is to use the key
> string as the token directly, so that the location information can be
> binded into the key. This minimize the changes to the other
> components.

This is what ByteOrderedPartitioner does, but that tends to create hot
spots since sequential keys are stored on the same node.

A better solution would be to just push the DecoratedKey into the
ReplicationStrategy so it can make its decision before information is
thrown away.

> Do you know how the existing application is achieving this without the
> per-row support?

All existing applications places replicas by keyspace, not by row.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Location-aware replication based on objects' access pattern

Posted by Yudong Gao <st...@umich.edu>.
Thanks for the reply, Jonathan!

This per-row control is exactly what I need. I will be happy to help
tackle it in the long term. Is there some further information or plan
for this issues?

One thing I am worrying about is how to maintain the location
information for each row. The current partitioner maps a key to MD5
hash, and it is almost impossible to control the hashed token by
manipulating the value of the key. Also, maintaining a key-to-location
mapping would be unscalable. My initial thought is to use the key
string as the token directly, so that the location information can be
binded into the key. This minimize the changes to the other
components.

Another problem for me is that currently we have a deadline coming
soon, so we need to get something up and running soon. It does not
need to perfect or general, and some quick tricks will be sufficient.
Do you know how the existing application is achieving this without the
per-row support?

Thanks!

Yudong

On Tue, Apr 5, 2011 at 6:39 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>
> You'd really want https://issues.apache.org/jira/browse/CASSANDRA-2369
> to control per-row. Let me know if you'd like to help tackle that.
>
> On Tue, Apr 5, 2011 at 5:05 PM, Yudong Gao <st...@umich.edu> wrote:
> >
> > Hi,
> >
> > I am thinking about using Cassandra for our research project, and we
> > are thinking about one interesting feature.
> >
> > Our setup has multiple datacenters located in different geography
> > locations. Data is accessed with predictable patterns. Think of
> > something like Craigslist, data objects corresponding to CA will
> > mostly accessed by users from the west cost. If this case, if all the
> > replicas are stored in the east coast, the access would not be
> > efficient. Other applications such as Facebook, should also have
> > similar concern.
> >
> > I am aware of the placement strategies such as
> > RackAwareStrategy/NetworkTopologyStrategy. But they place objects
> > based on their hashed token, but not they access pattern. I am
> > thinking about one possible trick, which is to manipulate the key of
> > the object based on its access pattern, so that the key can be mapped
> > to a token that will have at least one replica (ideally the primary
> > replica) stored in the desired data center, and the other replicas
> > stored in other data centers for reliability concern.
> >
> > I found this post discussing a similar problem,
> >
> > http://www.mail-archive.com/user@cassandra.apache.org/msg00695.html
> >
> > but Ben suggested just writing one new replication strategy. IMO, this
> > location-aware replication should be one common problem for Cassandra,
> > especially since it has been widely used in many large-scale
> > commercial applications such as Facebook and Twitter. I am interested
> > in how they handle this problem.
> >
> > Is there any existing solution that I refer to and get start with?
> >
> > Thanks!
> >
> > Yudong
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com

Re: Location-aware replication based on objects' access pattern

Posted by Jonathan Ellis <jb...@gmail.com>.
You'd really want https://issues.apache.org/jira/browse/CASSANDRA-2369
to control per-row. Let me know if you'd like to help tackle that.

On Tue, Apr 5, 2011 at 5:05 PM, Yudong Gao <st...@umich.edu> wrote:
>
> Hi,
>
> I am thinking about using Cassandra for our research project, and we
> are thinking about one interesting feature.
>
> Our setup has multiple datacenters located in different geography
> locations. Data is accessed with predictable patterns. Think of
> something like Craigslist, data objects corresponding to CA will
> mostly accessed by users from the west cost. If this case, if all the
> replicas are stored in the east coast, the access would not be
> efficient. Other applications such as Facebook, should also have
> similar concern.
>
> I am aware of the placement strategies such as
> RackAwareStrategy/NetworkTopologyStrategy. But they place objects
> based on their hashed token, but not they access pattern. I am
> thinking about one possible trick, which is to manipulate the key of
> the object based on its access pattern, so that the key can be mapped
> to a token that will have at least one replica (ideally the primary
> replica) stored in the desired data center, and the other replicas
> stored in other data centers for reliability concern.
>
> I found this post discussing a similar problem,
>
> http://www.mail-archive.com/user@cassandra.apache.org/msg00695.html
>
> but Ben suggested just writing one new replication strategy. IMO, this
> location-aware replication should be one common problem for Cassandra,
> especially since it has been widely used in many large-scale
> commercial applications such as Facebook and Twitter. I am interested
> in how they handle this problem.
>
> Is there any existing solution that I refer to and get start with?
>
> Thanks!
>
> Yudong
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com