You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Jason Horman <jh...@gmail.com> on 2010/10/08 19:36:41 UTC

Cold boot performance problems

We are experiencing very slow performance on Amazon EC2 after a cold boot.
10-20 tps. After the cache is primed things are much better, but it would be
nice if users who aren't in cache didn't experience such slow performance.

Before dumping a bunch of config I just had some general questions.

   - We are using uuid keys, 40m of them and the random partitioner. Typical
   access pattern is reading 200-300 keys in a single web request. Are uuid
   keys going to be painful b/c they are so random. Should we be using less
   random keys, maybe with a shard prefix (01-80), and make sure that our
   tokens group user data together on the cluster (via the order preserving
   partitioner)
   - Would the order preserving partitioner be a better option in the sense
   that it would group a single users data to a single set of machines (if we
   added a prefix to the uuid)?
   - Is there any benefit to doing sharding of our own via Keyspaces. 01-80
   keyspaces to split up the data files. (we already have 80 mysql shards we
   are migrating from, so doing this wouldn't be terrible implementation wise)
   - Should a goal be to get the data/index files as small as possible. Is
   there a size at which they become problematic? (Amazon EC2/EBS fyi)
      - Via more servers
      - Via more cassandra instances on the same server
      - Via manual sharding by keyspace
      - Via manual sharding by columnfamily

Thanks,

-- 
-jason horman

Re: Cold boot performance problems

Posted by aaron morton <aa...@thelastpickle.com>.

Creating more ColumnFamilies in more Keyspaces creates more memory overhead. I do not believe sharding your data is the way to go with cassandra. 

You mentioned that you read 200 to 300 keys per request, and it sounded like all this data was for a single user. If you can group all the user data into a single row (or a bounded number, 2 or 3) then your cassandra requests should be more performant. As less machines and less overall IO will be involved in the request. 

Aaron


On 9 Oct 2010, at 08:11, Jason Horman wrote:

> We are currently using EBS with 4 volumes striped with LVM. Wow, we
> didn't realize you could raid the ephemeral disks. I thought the
> opinion for Cassandra though was that the ephemeral disks were
> dangerous. We have lost of a few machines over the past year, but
> replicas hopefully prevent real trouble.
> 
> How about the sharding strategies? Is it worth it to investigate
> sharding out via multiple keyspaces? Would order preserving
> partitioning help group indexes better for users?
> 
> On Fri, Oct 8, 2010 at 1:53 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>> Two things that can help:
>> 
>> In 0.6.5, enable the dynamic snitch with
>> 
>> -Dcassandra.dynamic_snitch_enabled=true
>> -Dcassandra.dynamic_snitch=cassandra.dynamic_snitch_enabled
>> 
>> which if you are doing a rolling restart will let other nodes route
>> around the slow node (at CL.ONE) until it's warmed up (by the read
>> repairs in the background).
>> 
>> In 0.6.6, we've added save/load of the Cassandra caches:
>> https://issues.apache.org/jira/browse/CASSANDRA-1417
>> 
>> Finally: we recommend using raid0 ephemeral disks on EC2 with L or XL
>> instance sizes for better i/o performance.  (Corey Hulen has some
>> numbers at http://www.coreyhulen.org/?p=326.)
>> 
>> On Fri, Oct 8, 2010 at 12:36 PM, Jason Horman <jh...@gmail.com> wrote:
>>> We are experiencing very slow performance on Amazon EC2 after a cold boot.
>>> 10-20 tps. After the cache is primed things are much better, but it would be
>>> nice if users who aren't in cache didn't experience such slow performance.
>>> Before dumping a bunch of config I just had some general questions.
>>> 
>>> We are using uuid keys, 40m of them and the random partitioner. Typical
>>> access pattern is reading 200-300 keys in a single web request. Are uuid
>>> keys going to be painful b/c they are so random. Should we be using less
>>> random keys, maybe with a shard prefix (01-80), and make sure that our
>>> tokens group user data together on the cluster (via the order preserving
>>> partitioner)
>>> Would the order preserving partitioner be a better option in the sense that
>>> it would group a single users data to a single set of machines (if we added
>>> a prefix to the uuid)?
>>> Is there any benefit to doing sharding of our own via Keyspaces. 01-80
>>> keyspaces to split up the data files. (we already have 80 mysql shards we
>>> are migrating from, so doing this wouldn't be terrible implementation wise)
>>> Should a goal be to get the data/index files as small as possible. Is there
>>> a size at which they become problematic? (Amazon EC2/EBS fyi)
>>> 
>>> Via more servers
>>> Via more cassandra instances on the same server
>>> Via manual sharding by keyspace
>>> Via manual sharding by columnfamily
>>> 
>>> Thanks,
>>> --
>>> -jason horman
>>> 
>>> 
>> 
>> 
>> 
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>> 
> 
> 
> 
> -- 
> -jason

Re: Cold boot performance problems

Posted by Anthony Molinaro <an...@alumni.caltech.edu>.

On Fri, Oct 08, 2010 at 05:31:28PM -0700, Dave Viner wrote:
> Has anyone found solid step-by-step docs on how to raid0 the ephemeral disks
> in ec2 for use by Cassandra?

No, but here's a script I used to raid0 3 ephemerals in an xlarge instance.
You can edit the top part for different configs for different number of
ephemerals.

-Anthony

-----8<-----8<--- begin script ------>8----->8----

#!/bin/bash

########################
# config stuff to edit #
########################

# partitions to use
partitions="/dev/sdb /dev/sdd /dev/sde"

# tempfile to use
tempfile="/tmp/fdisk.raid"

# devices in raid
raidcount=3
raidparts="/dev/sdb1 /dev/sdd1 /dev/sde1"

# raiddev to use
raiddev="/dev/md0"


function buildraid () {

#######################################
# script, typically stuff not to edit #
#######################################

# loop through partitions, create the fdisk file and then create partitions
for partition in $partitions; do

# create an input file for fdisk
cat > "$tempfile" << EOF
n
p
1


t
fd
w
EOF

# partition the disks
echo "Partitioning $partition..."
fdisk $partition < $tempfile

# remove the temp file
rm -f /tmp/fdisk.raid

done

echo "Creating RAID device $raiddev..."
mdadm --create --verbose $raiddev --level=raid0 --raid-devices=$raidcount $raidparts

echo "Formatting RAID device $raiddev using ext3..."
mkfs.ext3 $raiddev

echo "RAID device $raiddev read to use.  Mount using mount -t ext3 $raiddev <mount point>"
}

function makefstab () {

tempfstab="/tmp/fstab"
cat > "$tempfstab" << EOF
/dev/sda1  /         ext3    defaults        1 1
/dev/sdc2  /mnt2     ext3    defaults        0 0
/dev/md0   /mnt      ext3    defaults        0 0
none       /dev/pts  devpts  gid=5,mode=620  0 0
none       /dev/shm  tmpfs   defaults        0 0
none       /proc     proc    defaults        0 0
none       /sys      sysfs   defaults        0 0
rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs defaults 0 0
EOF

cp /tmp/fstab /etc/fstab

}

# build raid with 3 disks
if test ! -f /var/lib/lock/builtraid.lock ;
  then
    touch /var/lib/lock/builtraid.lock
    umount /mnt
    buildraid
    makefstab
    mount /mnt
  fi

-----8<-----8<--- end script ------>8----->8----

Re: Cold boot performance problems

Posted by Dave Viner <da...@pobox.com>.

Has anyone found solid step-by-step docs on how to raid0 the ephemeral disks
in ec2 for use by Cassandra?

On Fri, Oct 8, 2010 at 12:11 PM, Jason Horman <jh...@gmail.com> wrote:

> We are currently using EBS with 4 volumes striped with LVM. Wow, we
> didn't realize you could raid the ephemeral disks. I thought the
> opinion for Cassandra though was that the ephemeral disks were
> dangerous. We have lost of a few machines over the past year, but
> replicas hopefully prevent real trouble.
>
> How about the sharding strategies? Is it worth it to investigate
> sharding out via multiple keyspaces? Would order preserving
> partitioning help group indexes better for users?
>
> On Fri, Oct 8, 2010 at 1:53 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> > Two things that can help:
> >
> > In 0.6.5, enable the dynamic snitch with
> >
> > -Dcassandra.dynamic_snitch_enabled=true
> > -Dcassandra.dynamic_snitch=cassandra.dynamic_snitch_enabled
> >
> > which if you are doing a rolling restart will let other nodes route
> > around the slow node (at CL.ONE) until it's warmed up (by the read
> > repairs in the background).
> >
> > In 0.6.6, we've added save/load of the Cassandra caches:
> > https://issues.apache.org/jira/browse/CASSANDRA-1417
> >
> > Finally: we recommend using raid0 ephemeral disks on EC2 with L or XL
> > instance sizes for better i/o performance.  (Corey Hulen has some
> > numbers at http://www.coreyhulen.org/?p=326.)
> >
> > On Fri, Oct 8, 2010 at 12:36 PM, Jason Horman <jh...@gmail.com> wrote:
> >> We are experiencing very slow performance on Amazon EC2 after a cold
> boot.
> >> 10-20 tps. After the cache is primed things are much better, but it
> would be
> >> nice if users who aren't in cache didn't experience such slow
> performance.
> >> Before dumping a bunch of config I just had some general questions.
> >>
> >> We are using uuid keys, 40m of them and the random partitioner. Typical
> >> access pattern is reading 200-300 keys in a single web request. Are uuid
> >> keys going to be painful b/c they are so random. Should we be using less
> >> random keys, maybe with a shard prefix (01-80), and make sure that our
> >> tokens group user data together on the cluster (via the order preserving
> >> partitioner)
> >> Would the order preserving partitioner be a better option in the sense
> that
> >> it would group a single users data to a single set of machines (if we
> added
> >> a prefix to the uuid)?
> >> Is there any benefit to doing sharding of our own via Keyspaces. 01-80
> >> keyspaces to split up the data files. (we already have 80 mysql shards
> we
> >> are migrating from, so doing this wouldn't be terrible implementation
> wise)
> >> Should a goal be to get the data/index files as small as possible. Is
> there
> >> a size at which they become problematic? (Amazon EC2/EBS fyi)
> >>
> >> Via more servers
> >> Via more cassandra instances on the same server
> >> Via manual sharding by keyspace
> >> Via manual sharding by columnfamily
> >>
> >> Thanks,
> >> --
> >> -jason horman
> >>
> >>
> >
> >
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder of Riptano, the source for professional Cassandra support
> > http://riptano.com
> >
>
>
>
> --
> -jason
>

Re: Cold boot performance problems

Posted by Jason Horman <jh...@gmail.com>.

We are currently using EBS with 4 volumes striped with LVM. Wow, we
didn't realize you could raid the ephemeral disks. I thought the
opinion for Cassandra though was that the ephemeral disks were
dangerous. We have lost of a few machines over the past year, but
replicas hopefully prevent real trouble.

How about the sharding strategies? Is it worth it to investigate
sharding out via multiple keyspaces? Would order preserving
partitioning help group indexes better for users?

On Fri, Oct 8, 2010 at 1:53 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Two things that can help:
>
> In 0.6.5, enable the dynamic snitch with
>
> -Dcassandra.dynamic_snitch_enabled=true
> -Dcassandra.dynamic_snitch=cassandra.dynamic_snitch_enabled
>
> which if you are doing a rolling restart will let other nodes route
> around the slow node (at CL.ONE) until it's warmed up (by the read
> repairs in the background).
>
> In 0.6.6, we've added save/load of the Cassandra caches:
> https://issues.apache.org/jira/browse/CASSANDRA-1417
>
> Finally: we recommend using raid0 ephemeral disks on EC2 with L or XL
> instance sizes for better i/o performance.  (Corey Hulen has some
> numbers at http://www.coreyhulen.org/?p=326.)
>
> On Fri, Oct 8, 2010 at 12:36 PM, Jason Horman <jh...@gmail.com> wrote:
>> We are experiencing very slow performance on Amazon EC2 after a cold boot.
>> 10-20 tps. After the cache is primed things are much better, but it would be
>> nice if users who aren't in cache didn't experience such slow performance.
>> Before dumping a bunch of config I just had some general questions.
>>
>> We are using uuid keys, 40m of them and the random partitioner. Typical
>> access pattern is reading 200-300 keys in a single web request. Are uuid
>> keys going to be painful b/c they are so random. Should we be using less
>> random keys, maybe with a shard prefix (01-80), and make sure that our
>> tokens group user data together on the cluster (via the order preserving
>> partitioner)
>> Would the order preserving partitioner be a better option in the sense that
>> it would group a single users data to a single set of machines (if we added
>> a prefix to the uuid)?
>> Is there any benefit to doing sharding of our own via Keyspaces. 01-80
>> keyspaces to split up the data files. (we already have 80 mysql shards we
>> are migrating from, so doing this wouldn't be terrible implementation wise)
>> Should a goal be to get the data/index files as small as possible. Is there
>> a size at which they become problematic? (Amazon EC2/EBS fyi)
>>
>> Via more servers
>> Via more cassandra instances on the same server
>> Via manual sharding by keyspace
>> Via manual sharding by columnfamily
>>
>> Thanks,
>> --
>> -jason horman
>>
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>



-- 
-jason

Re: Cold boot performance problems

Posted by Jonathan Ellis <jb...@gmail.com>.

Two things that can help:

In 0.6.5, enable the dynamic snitch with

-Dcassandra.dynamic_snitch_enabled=true
-Dcassandra.dynamic_snitch=cassandra.dynamic_snitch_enabled

which if you are doing a rolling restart will let other nodes route
around the slow node (at CL.ONE) until it's warmed up (by the read
repairs in the background).

In 0.6.6, we've added save/load of the Cassandra caches:
https://issues.apache.org/jira/browse/CASSANDRA-1417

Finally: we recommend using raid0 ephemeral disks on EC2 with L or XL
instance sizes for better i/o performance.  (Corey Hulen has some
numbers at http://www.coreyhulen.org/?p=326.)

On Fri, Oct 8, 2010 at 12:36 PM, Jason Horman <jh...@gmail.com> wrote:
> We are experiencing very slow performance on Amazon EC2 after a cold boot.
> 10-20 tps. After the cache is primed things are much better, but it would be
> nice if users who aren't in cache didn't experience such slow performance.
> Before dumping a bunch of config I just had some general questions.
>
> We are using uuid keys, 40m of them and the random partitioner. Typical
> access pattern is reading 200-300 keys in a single web request. Are uuid
> keys going to be painful b/c they are so random. Should we be using less
> random keys, maybe with a shard prefix (01-80), and make sure that our
> tokens group user data together on the cluster (via the order preserving
> partitioner)
> Would the order preserving partitioner be a better option in the sense that
> it would group a single users data to a single set of machines (if we added
> a prefix to the uuid)?
> Is there any benefit to doing sharding of our own via Keyspaces. 01-80
> keyspaces to split up the data files. (we already have 80 mysql shards we
> are migrating from, so doing this wouldn't be terrible implementation wise)
> Should a goal be to get the data/index files as small as possible. Is there
> a size at which they become problematic? (Amazon EC2/EBS fyi)
>
> Via more servers
> Via more cassandra instances on the same server
> Via manual sharding by keyspace
> Via manual sharding by columnfamily
>
> Thanks,
> --
> -jason horman
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com