You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jason Horman <jh...@gmail.com> on 2010/10/08 19:36:41 UTC
Cold boot performance problems
We are experiencing very slow performance on Amazon EC2 after a cold boot.
10-20 tps. After the cache is primed things are much better, but it would be
nice if users who aren't in cache didn't experience such slow performance.
Before dumping a bunch of config I just had some general questions.
- We are using uuid keys, 40m of them and the random partitioner. Typical
access pattern is reading 200-300 keys in a single web request. Are uuid
keys going to be painful b/c they are so random. Should we be using less
random keys, maybe with a shard prefix (01-80), and make sure that our
tokens group user data together on the cluster (via the order preserving
partitioner)
- Would the order preserving partitioner be a better option in the sense
that it would group a single users data to a single set of machines (if we
added a prefix to the uuid)?
- Is there any benefit to doing sharding of our own via Keyspaces. 01-80
keyspaces to split up the data files. (we already have 80 mysql shards we
are migrating from, so doing this wouldn't be terrible implementation wise)
- Should a goal be to get the data/index files as small as possible. Is
there a size at which they become problematic? (Amazon EC2/EBS fyi)
- Via more servers
- Via more cassandra instances on the same server
- Via manual sharding by keyspace
- Via manual sharding by columnfamily
Thanks,
--
-jason horman
Re: Cold boot performance problems
Posted by aaron morton <aa...@thelastpickle.com>.
Creating more ColumnFamilies in more Keyspaces creates more memory overhead. I do not believe sharding your data is the way to go with cassandra.
You mentioned that you read 200 to 300 keys per request, and it sounded like all this data was for a single user. If you can group all the user data into a single row (or a bounded number, 2 or 3) then your cassandra requests should be more performant. As less machines and less overall IO will be involved in the request.
Aaron
On 9 Oct 2010, at 08:11, Jason Horman wrote:
> We are currently using EBS with 4 volumes striped with LVM. Wow, we
> didn't realize you could raid the ephemeral disks. I thought the
> opinion for Cassandra though was that the ephemeral disks were
> dangerous. We have lost of a few machines over the past year, but
> replicas hopefully prevent real trouble.
>
> How about the sharding strategies? Is it worth it to investigate
> sharding out via multiple keyspaces? Would order preserving
> partitioning help group indexes better for users?
>
> On Fri, Oct 8, 2010 at 1:53 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>> Two things that can help:
>>
>> In 0.6.5, enable the dynamic snitch with
>>
>> -Dcassandra.dynamic_snitch_enabled=true
>> -Dcassandra.dynamic_snitch=cassandra.dynamic_snitch_enabled
>>
>> which if you are doing a rolling restart will let other nodes route
>> around the slow node (at CL.ONE) until it's warmed up (by the read
>> repairs in the background).
>>
>> In 0.6.6, we've added save/load of the Cassandra caches:
>> https://issues.apache.org/jira/browse/CASSANDRA-1417
>>
>> Finally: we recommend using raid0 ephemeral disks on EC2 with L or XL
>> instance sizes for better i/o performance. (Corey Hulen has some
>> numbers at http://www.coreyhulen.org/?p=326.)
>>
>> On Fri, Oct 8, 2010 at 12:36 PM, Jason Horman <jh...@gmail.com> wrote:
>>> We are experiencing very slow performance on Amazon EC2 after a cold boot.
>>> 10-20 tps. After the cache is primed things are much better, but it would be
>>> nice if users who aren't in cache didn't experience such slow performance.
>>> Before dumping a bunch of config I just had some general questions.
>>>
>>> We are using uuid keys, 40m of them and the random partitioner. Typical
>>> access pattern is reading 200-300 keys in a single web request. Are uuid
>>> keys going to be painful b/c they are so random. Should we be using less
>>> random keys, maybe with a shard prefix (01-80), and make sure that our
>>> tokens group user data together on the cluster (via the order preserving
>>> partitioner)
>>> Would the order preserving partitioner be a better option in the sense that
>>> it would group a single users data to a single set of machines (if we added
>>> a prefix to the uuid)?
>>> Is there any benefit to doing sharding of our own via Keyspaces. 01-80
>>> keyspaces to split up the data files. (we already have 80 mysql shards we
>>> are migrating from, so doing this wouldn't be terrible implementation wise)
>>> Should a goal be to get the data/index files as small as possible. Is there
>>> a size at which they become problematic? (Amazon EC2/EBS fyi)
>>>
>>> Via more servers
>>> Via more cassandra instances on the same server
>>> Via manual sharding by keyspace
>>> Via manual sharding by columnfamily
>>>
>>> Thanks,
>>> --
>>> -jason horman
>>>
>>>
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>>
>
>
>
> --
> -jason
Re: Cold boot performance problems
Posted by Anthony Molinaro <an...@alumni.caltech.edu>.
On Fri, Oct 08, 2010 at 05:31:28PM -0700, Dave Viner wrote:
> Has anyone found solid step-by-step docs on how to raid0 the ephemeral disks
> in ec2 for use by Cassandra?
No, but here's a script I used to raid0 3 ephemerals in an xlarge instance.
You can edit the top part for different configs for different number of
ephemerals.
-Anthony
-----8<-----8<--- begin script ------>8----->8----
#!/bin/bash
########################
# config stuff to edit #
########################
# partitions to use
partitions="/dev/sdb /dev/sdd /dev/sde"
# tempfile to use
tempfile="/tmp/fdisk.raid"
# devices in raid
raidcount=3
raidparts="/dev/sdb1 /dev/sdd1 /dev/sde1"
# raiddev to use
raiddev="/dev/md0"
function buildraid () {
#######################################
# script, typically stuff not to edit #
#######################################
# loop through partitions, create the fdisk file and then create partitions
for partition in $partitions; do
# create an input file for fdisk
cat > "$tempfile" << EOF
n
p
1
t
fd
w
EOF
# partition the disks
echo "Partitioning $partition..."
fdisk $partition < $tempfile
# remove the temp file
rm -f /tmp/fdisk.raid
done
echo "Creating RAID device $raiddev..."
mdadm --create --verbose $raiddev --level=raid0 --raid-devices=$raidcount $raidparts
echo "Formatting RAID device $raiddev using ext3..."
mkfs.ext3 $raiddev
echo "RAID device $raiddev read to use. Mount using mount -t ext3 $raiddev <mount point>"
}
function makefstab () {
tempfstab="/tmp/fstab"
cat > "$tempfstab" << EOF
/dev/sda1 / ext3 defaults 1 1
/dev/sdc2 /mnt2 ext3 defaults 0 0
/dev/md0 /mnt ext3 defaults 0 0
none /dev/pts devpts gid=5,mode=620 0 0
none /dev/shm tmpfs defaults 0 0
none /proc proc defaults 0 0
none /sys sysfs defaults 0 0
rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs defaults 0 0
EOF
cp /tmp/fstab /etc/fstab
}
# build raid with 3 disks
if test ! -f /var/lib/lock/builtraid.lock ;
then
touch /var/lib/lock/builtraid.lock
umount /mnt
buildraid
makefstab
mount /mnt
fi
-----8<-----8<--- end script ------>8----->8----
Re: Cold boot performance problems
Posted by Dave Viner <da...@pobox.com>.
Has anyone found solid step-by-step docs on how to raid0 the ephemeral disks
in ec2 for use by Cassandra?
On Fri, Oct 8, 2010 at 12:11 PM, Jason Horman <jh...@gmail.com> wrote:
> We are currently using EBS with 4 volumes striped with LVM. Wow, we
> didn't realize you could raid the ephemeral disks. I thought the
> opinion for Cassandra though was that the ephemeral disks were
> dangerous. We have lost of a few machines over the past year, but
> replicas hopefully prevent real trouble.
>
> How about the sharding strategies? Is it worth it to investigate
> sharding out via multiple keyspaces? Would order preserving
> partitioning help group indexes better for users?
>
> On Fri, Oct 8, 2010 at 1:53 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> > Two things that can help:
> >
> > In 0.6.5, enable the dynamic snitch with
> >
> > -Dcassandra.dynamic_snitch_enabled=true
> > -Dcassandra.dynamic_snitch=cassandra.dynamic_snitch_enabled
> >
> > which if you are doing a rolling restart will let other nodes route
> > around the slow node (at CL.ONE) until it's warmed up (by the read
> > repairs in the background).
> >
> > In 0.6.6, we've added save/load of the Cassandra caches:
> > https://issues.apache.org/jira/browse/CASSANDRA-1417
> >
> > Finally: we recommend using raid0 ephemeral disks on EC2 with L or XL
> > instance sizes for better i/o performance. (Corey Hulen has some
> > numbers at http://www.coreyhulen.org/?p=326.)
> >
> > On Fri, Oct 8, 2010 at 12:36 PM, Jason Horman <jh...@gmail.com> wrote:
> >> We are experiencing very slow performance on Amazon EC2 after a cold
> boot.
> >> 10-20 tps. After the cache is primed things are much better, but it
> would be
> >> nice if users who aren't in cache didn't experience such slow
> performance.
> >> Before dumping a bunch of config I just had some general questions.
> >>
> >> We are using uuid keys, 40m of them and the random partitioner. Typical
> >> access pattern is reading 200-300 keys in a single web request. Are uuid
> >> keys going to be painful b/c they are so random. Should we be using less
> >> random keys, maybe with a shard prefix (01-80), and make sure that our
> >> tokens group user data together on the cluster (via the order preserving
> >> partitioner)
> >> Would the order preserving partitioner be a better option in the sense
> that
> >> it would group a single users data to a single set of machines (if we
> added
> >> a prefix to the uuid)?
> >> Is there any benefit to doing sharding of our own via Keyspaces. 01-80
> >> keyspaces to split up the data files. (we already have 80 mysql shards
> we
> >> are migrating from, so doing this wouldn't be terrible implementation
> wise)
> >> Should a goal be to get the data/index files as small as possible. Is
> there
> >> a size at which they become problematic? (Amazon EC2/EBS fyi)
> >>
> >> Via more servers
> >> Via more cassandra instances on the same server
> >> Via manual sharding by keyspace
> >> Via manual sharding by columnfamily
> >>
> >> Thanks,
> >> --
> >> -jason horman
> >>
> >>
> >
> >
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder of Riptano, the source for professional Cassandra support
> > http://riptano.com
> >
>
>
>
> --
> -jason
>
Re: Cold boot performance problems
Posted by Jason Horman <jh...@gmail.com>.
We are currently using EBS with 4 volumes striped with LVM. Wow, we
didn't realize you could raid the ephemeral disks. I thought the
opinion for Cassandra though was that the ephemeral disks were
dangerous. We have lost of a few machines over the past year, but
replicas hopefully prevent real trouble.
How about the sharding strategies? Is it worth it to investigate
sharding out via multiple keyspaces? Would order preserving
partitioning help group indexes better for users?
On Fri, Oct 8, 2010 at 1:53 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Two things that can help:
>
> In 0.6.5, enable the dynamic snitch with
>
> -Dcassandra.dynamic_snitch_enabled=true
> -Dcassandra.dynamic_snitch=cassandra.dynamic_snitch_enabled
>
> which if you are doing a rolling restart will let other nodes route
> around the slow node (at CL.ONE) until it's warmed up (by the read
> repairs in the background).
>
> In 0.6.6, we've added save/load of the Cassandra caches:
> https://issues.apache.org/jira/browse/CASSANDRA-1417
>
> Finally: we recommend using raid0 ephemeral disks on EC2 with L or XL
> instance sizes for better i/o performance. (Corey Hulen has some
> numbers at http://www.coreyhulen.org/?p=326.)
>
> On Fri, Oct 8, 2010 at 12:36 PM, Jason Horman <jh...@gmail.com> wrote:
>> We are experiencing very slow performance on Amazon EC2 after a cold boot.
>> 10-20 tps. After the cache is primed things are much better, but it would be
>> nice if users who aren't in cache didn't experience such slow performance.
>> Before dumping a bunch of config I just had some general questions.
>>
>> We are using uuid keys, 40m of them and the random partitioner. Typical
>> access pattern is reading 200-300 keys in a single web request. Are uuid
>> keys going to be painful b/c they are so random. Should we be using less
>> random keys, maybe with a shard prefix (01-80), and make sure that our
>> tokens group user data together on the cluster (via the order preserving
>> partitioner)
>> Would the order preserving partitioner be a better option in the sense that
>> it would group a single users data to a single set of machines (if we added
>> a prefix to the uuid)?
>> Is there any benefit to doing sharding of our own via Keyspaces. 01-80
>> keyspaces to split up the data files. (we already have 80 mysql shards we
>> are migrating from, so doing this wouldn't be terrible implementation wise)
>> Should a goal be to get the data/index files as small as possible. Is there
>> a size at which they become problematic? (Amazon EC2/EBS fyi)
>>
>> Via more servers
>> Via more cassandra instances on the same server
>> Via manual sharding by keyspace
>> Via manual sharding by columnfamily
>>
>> Thanks,
>> --
>> -jason horman
>>
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>
--
-jason
Re: Cold boot performance problems
Posted by Jonathan Ellis <jb...@gmail.com>.
Two things that can help:
In 0.6.5, enable the dynamic snitch with
-Dcassandra.dynamic_snitch_enabled=true
-Dcassandra.dynamic_snitch=cassandra.dynamic_snitch_enabled
which if you are doing a rolling restart will let other nodes route
around the slow node (at CL.ONE) until it's warmed up (by the read
repairs in the background).
In 0.6.6, we've added save/load of the Cassandra caches:
https://issues.apache.org/jira/browse/CASSANDRA-1417
Finally: we recommend using raid0 ephemeral disks on EC2 with L or XL
instance sizes for better i/o performance. (Corey Hulen has some
numbers at http://www.coreyhulen.org/?p=326.)
On Fri, Oct 8, 2010 at 12:36 PM, Jason Horman <jh...@gmail.com> wrote:
> We are experiencing very slow performance on Amazon EC2 after a cold boot.
> 10-20 tps. After the cache is primed things are much better, but it would be
> nice if users who aren't in cache didn't experience such slow performance.
> Before dumping a bunch of config I just had some general questions.
>
> We are using uuid keys, 40m of them and the random partitioner. Typical
> access pattern is reading 200-300 keys in a single web request. Are uuid
> keys going to be painful b/c they are so random. Should we be using less
> random keys, maybe with a shard prefix (01-80), and make sure that our
> tokens group user data together on the cluster (via the order preserving
> partitioner)
> Would the order preserving partitioner be a better option in the sense that
> it would group a single users data to a single set of machines (if we added
> a prefix to the uuid)?
> Is there any benefit to doing sharding of our own via Keyspaces. 01-80
> keyspaces to split up the data files. (we already have 80 mysql shards we
> are migrating from, so doing this wouldn't be terrible implementation wise)
> Should a goal be to get the data/index files as small as possible. Is there
> a size at which they become problematic? (Amazon EC2/EBS fyi)
>
> Via more servers
> Via more cassandra instances on the same server
> Via manual sharding by keyspace
> Via manual sharding by columnfamily
>
> Thanks,
> --
> -jason horman
>
>
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com