You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by Apache Wiki <wi...@apache.org> on 2010/06/15 02:51:36 UTC

[Cassandra Wiki] Update of "CloudConfig" by DaveViner

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "CloudConfig" page has been changed by DaveViner.
The comment on this change is: step-by-step guide to launching Cassandra on EC2..
http://wiki.apache.org/cassandra/CloudConfig?action=diff&rev1=1&rev2=2

--------------------------------------------------

  = Setting up Cassandra in the Cloud =
- 
- ''If you have done work to optimize your Cassandra install in the cloud, please take a moment to contribute some of that knowledge to this page'' 
+ ''If you have done work to optimize your Cassandra install in the cloud, please take a moment to contribute some of that knowledge to this page''
- 
  
  == Amazon Web Services (AWS/EC2) ==
- 
   * There is an ec2snitch to make Cassandra rack-aware in the ec2 cloud.
   * [[http://github.com/b/cookbooks/tree/master/cassandra|Chef install for Cassandra]], including ec2snitch setup.
  
  === Optimizing Volume Performance for a Transient Cluster ===
- 
- Depending on node size, and on how many EBS volumes are attached, most EC2 nodes will have many independent attached volumes. 
+ Depending on node size, and on how many EBS volumes are attached, most EC2 nodes will have many independent attached volumes.
  
   * How should the Cassandra config be modified to take advantage of multiple attached volumes?
   * What are the tradeoffs for EBS vs local drives as backing store for a persistent cluster?
@@ -21, +17 @@

  
   * For a non-persistent cluster, can Cassandra take advantage of the scratch disks (assume they are fast but could disappear at once across the whole cluster at any time)
  
+ == Step-By-Step Guide to Installing Cassandra on EC2 & Debian ==
+ 
+ === Assumptions ===
+ 
+ We will assume that the goal is to install Cassandra in a multi-Availability Zone configuration.  However, all nodes must be in one Region because we will use the private IP addresses for the nodes to talk to each other.
+ 
+ We also will setup Security Groups for the Cassandra nodes to talk to one another, and also for other nodes to talk to Cassandra. 
+ 
+ In the course of this document, we reference 'lwp-request' and 'ec2_signer.pl'.  These are just simple perl programs that send HTTP requests (lwp-request) and that construct a signed URL based on the parameters given (ec2_signer.pl).
+ 
+ === Steps ===
+ 
+ ==== Step 1. Setup the "talk to Cassandra" Security Group ====
+ 
+ This group will contain any machine which desires to communicate with Cassandra.
+ 
+ {{{
+ % lwp-request -SUse `perl ../../ec2/ec2_signer.pl "https://ec2.us-east-1.amazonaws.com/?Action=CreateSecurityGroup&GroupName=Talk+To+Cassandra+Local+Zone&GroupDescription=Group+for+any+machine+that+talks+to+Cassandra"`
+ }}}
+ 
+ Also, let us open SSH to the machines in this group.
+ 
+ {{{
+ % lwp-request -SUse `perl ../../ec2/ec2_signer.pl "https://ec2.us-east-1.amazonaws.com/?Action=AuthorizeSecurityGroupIngress&GroupName=Talk+To+Cassandra+Local+Zone&IpPermissions.1.IpProtocol=tcp&IpPermissions.1.FromPort=22&IpPermissions.1.ToPort=22&IpPermissions.1.IpRanges.1.CidrIp=0.0.0.0/0"`
+ }}}
+ 
+ If you are just testing stuff, here's how you can delete the Security Group.
+ 
+ {{{
+ % lwp-request -SUse `perl ../../ec2/ec2_signer.pl "https://ec2.us-east-1.amazonaws.com/?Action=DeleteSecurityGroup&GroupName=Talk+To+Cassandra+Local+Zone"`
+ % lwp-request -SUse `perl ../../ec2/ec2_signer.pl "https://ec2.us-east-1.amazonaws.com/?Action=DeleteSecurityGroup&GroupName=Cassandra+Nodes"`
+ }}}
+ 
+ ==== Step 2. Get the OwnerID of the "talk to Cassandra" Security Group. ====
+ 
+ We need to know what the numeric OwnerID of the security group is.  Write down the value returned here.
+ 
+ {{{
+ % lwp-request -SUse `perl ../../ec2/ec2_signer.pl "https://ec2.us-east-1.amazonaws.com/?Action=DescribeSecurityGroups&GroupName.1=Talk+To+Cassandra+Local+Zone"`
+ }}}
+ 
+ It will be something like 2931201231.
+ 
+ ==== Step 3. Create the "Cassandra Nodes" Security Group. ====
+ 
+ {{{
+ % lwp-request -SUse `perl ../../ec2/ec2_signer.pl "https://ec2.us-east-1.amazonaws.com/?Action=CreateSecurityGroup&GroupName=Cassandra+Nodes&GroupDescription=Group+for+any+Cassandra+machine"`
+ }}}
+ 
+ Open SSH so we can get there as well.
+ 
+ {{{
+ % lwp-request -SUse `perl ../../ec2/ec2_signer.pl "https://ec2.us-east-1.amazonaws.com/?Action=AuthorizeSecurityGroupIngress&GroupName=Cassandra+Nodes&IpPermissions.1.IpProtocol=tcp&IpPermissions.1.FromPort=22&IpPermissions.1.ToPort=22&IpPermissions.1.IpRanges.1.CidrIp=0.0.0.0/0"`
+ }}}
+ 
+ ==== Step 4. Get the OwnerID of the "Cassandra Nodes" Security Group. ====
+ 
+ {{{
+ % lwp-request -SUse `perl ../../ec2/ec2_signer.pl "https://ec2.us-east-1.amazonaws.com/?Action=DescribeSecurityGroups&GroupName.1=Cassandra+Nodes"`
+ }}}
+ 
+ ==== Step 5. Allow access between the Cassandra nodes on the Cassandra ports ====
+ 
+ There are 3 ports that Cassandra nodes use to talk to each other: Gossip, Thrift, and JMX.
+ 
+ # Gossip port
+ 
+ {{{
+ % lwp-request -SUse `perl ../../ec2/ec2_signer.pl "https://ec2.us-east-1.amazonaws.com/?Action=AuthorizeSecurityGroupIngress&GroupName=Cassandra+Nodes&IpPermissions.1.IpProtocol=tcp&IpPermissions.1.FromPort=7000&IpPermissions.1.ToPort=7000&IpPermissions.1.Groups.1.UserId=152252226102&IpPermissions.1.Groups.1.GroupName=Cassandra+Nodes"`
+ }}}
+ 
+ # Thrift Port
+ 
+ {{{
+ % lwp-request -SUse `perl ../../ec2/ec2_signer.pl "https://ec2.us-east-1.amazonaws.com/?Action=AuthorizeSecurityGroupIngress&GroupName=Cassandra+Nodes&IpPermissions.1.IpProtocol=tcp&IpPermissions.1.FromPort=9160&IpPermissions.1.ToPort=9160&IpPermissions.1.Groups.1.UserId=152252226102&IpPermissions.1.Groups.1.GroupName=Cassandra+Nodes"`
+ }}}
+ 
+ # JMX Port
+ 
+ {{{
+ % lwp-request -SUse `perl ../../ec2/ec2_signer.pl "https://ec2.us-east-1.amazonaws.com/?Action=AuthorizeSecurityGroupIngress&GroupName=Cassandra+Nodes&IpPermissions.1.IpProtocol=tcp&IpPermissions.1.FromPort=8080&IpPermissions.1.ToPort=8080&IpPermissions.1.Groups.1.UserId=152252226102&IpPermissions.1.Groups.1.GroupName=Cassandra+Nodes"`
+ }}}
+ 
+ ==== Step 6. Allow access between "Talk to Cassandra" nodes and the Cassandra nodes themselves. ====
+ 
+ The non-Cassandra nodes use Thrift to talk to Cassandra.  So we need to open that port to the talkers.  We also open the JMX port so that monitoring can occur.
+ 
+ {{{
+ % lwp-request -SUse `perl ../../ec2/ec2_signer.pl "https://ec2.us-east-1.amazonaws.com/?Action=AuthorizeSecurityGroupIngress&GroupName=Cassandra+Nodes&IpPermissions.1.IpProtocol=tcp&IpPermissions.1.FromPort=9160&IpPermissions.1.ToPort=9160&IpPermissions.1.Groups.1.UserId=152252226102&IpPermissions.1.Groups.1.GroupName=Talk+To+Cassandra+Local+Zone"`
+ % lwp-request -SUse `perl ../../ec2/ec2_signer.pl "https://ec2.us-east-1.amazonaws.com/?Action=AuthorizeSecurityGroupIngress&GroupName=Cassandra+Nodes&IpPermissions.1.IpProtocol=tcp&IpPermissions.1.FromPort=8080&IpPermissions.1.ToPort=8080&IpPermissions.1.Groups.1.UserId=152252226102&IpPermissions.1.Groups.1.GroupName=Talk+To+Cassandra+Local+Zone"`
+ }}}
+ 
+ ==== Step 7. Create a Key Pair ====
+ 
+ You might already have a key pair.  You must have a key-pair to log into an EC2 instance.  If you already have one and have the private key for it, you can safely skip this step.
+ 
+ {{{
+ % lwp-request -SUse `perl ../../ec2/ec2_signer.pl "https://ec2.us-east-1.amazonaws.com/?Action=CreateKeyPair&KeyName=CassandraLauncher-East"`
+ }}}
+ 
+ Record the keyMaterial in the response.
+ 
+ ==== Step 8. Pick the correct seed instances. ====
+ 
+ For Cassandra, we use 64bit Debian Lenny.  Currently, the available AMIs are:
+ 
+ * us-east-1: ami-f0f61599
+ * us-west-1: ami-4d3d6c08
+ * eu-west-1: ami-80446ff4
+ * ap-southeast-1: ami-a3f38cf1
+ 
+ These come from http://alestic.com/.
+ 
+ ==== Step 9.  Select the Availability Zone you want. ====
+ 
+ For the us-east-1, there are 4 AZs:
+ 
+ * us-east-1a
+ * us-east-1b
+ * us-east-1c
+ * us-east-1d
+ 
+ ==== Step 10. Start up your SEED instance ====
+ 
+ NOTE, the KeyName must be one for which you have the security key (from the KeyPair).  In the example below, we use the key name 'dviner' and the AZ us-east-1a.  Replace these with your key name and AZ selection.
+ 
+ {{{
+ % lwp-request -SUse `perl ../../ec2/ec2_signer.pl "https://ec2.us-east-1.amazonaws.com/?Action=RunInstances&ImageId=ami-f0f61599&MinCount=1&MaxCount=1&KeyName=dviner&SecurityGroup.1=Cassandra+Nodes&InstanceType=m1.large&DisableApiTermination=false&Monitoring.Enabled=false&Placement.AvailabilityZone=us-east-1a"`
+ }}}
+ 
+ RECORD THE INSTANCE ID (/RunInstancesResponse/instancesSet/item/instanceId)
+ 
+ ==== Step 11. Get the IP address of the SEED instance ====
+ 
+ 
+ You must insert the InstanceId obtained in Step 10.
+ 
+ {{{
+ % lwp-request -SUse `perl ../../ec2/ec2_signer.pl "https://ec2.us-east-1.amazonaws.com/?Action=DescribeInstances&InstanceId.1=INSTANCE-ID"`
+ }}}
+ 
+ ==== Step 12. Setup Cassandra on SEED ====
+ 
+ See below for this information.
+ 
+ ==== Step 13. Start up a NON-SEED instance ====
+ 
+ This is exactly like launching a SEED instance.
+ 
+ ==== Step 14. Setup Cassandra on NON-SEED ====
+ 
+ See below.
+ 
+ ==== Step 15. Shutting Down an Instance ====
+ 
+ {{{
+ % lwp-request -SUse `perl ../../ec2/ec2_signer.pl "https://ec2.us-east-1.amazonaws.com/?Action=TerminateInstances&InstanceId=INSTANCE-ID"`
+ }}}
+ 
+ === Cassandra Basic Setup ===
+ 
+ The steps here assume that you have an instance running and you can connect to it.  These steps are applicable for both seed nodes and non-seed nodes.
+ 
+ ==== Step 1. Add the Cassandra APT repository ====
+ 
+ {{{
+ % cat setup/cassandra.list 
+     # taken from http://wiki.apache.org/cassandra/DebianPackaging
+     deb http://www.apache.org/dist/cassandra/debian unstable main
+     deb-src http://www.apache.org/dist/cassandra/debian unstable main
+ % scp -i ../your-private-key cassandra.list root@YOUR-INSTANCE.amazonaws.com:/etc/apt/sources.list.d/
+ }}}
+ 
+ ==== Step 2. Add the GPG keys on the instance ====
+ 
+ {{{
+ % apt-get update
+ % apt-get upgrade
+ % gpg --keyserver wwwkeys.pgp.net --recv-keys F758CE318D77295D
+ % gpg --export --armor F758CE318D77295D | apt-key add -
+ }}}
+ 
+ ==== Step 3. Install the Debian package for Cassandra ====
+ 
+ {{{
+ % apt-get update
+ % apt-get install cassandra
+ }}}
+ 
+ At this point, Cassandra will be installed and running.  However, it's not configured for a multi-node cluster.  So we need to continue.
+ 
+ ==== Step 4. Turn off the default Cassandra ====
+ 
+ {{{
+ % /etc/init.d/cassandra stop
+ % rm /var/lib/cassandra/commitlog/*
+ % rm -r /var/lib/cassandra/data/
+ }}}
+ 
+ === Cassandra Seed & Non-Seed Configurations ===
+ 
+ In setting up the seed and non-seed nodes, we just alter the configuration file (which lives at /etc/cassandra/storage-conf.xml).  These are the important differences:
+ 
+ SEED CONF:
+ 
+ * LISTENADDRESS is IP of this node
+ * ThriftAddress is IP of this node (or 0.0.0.0)
+ * SEED is IP of this node
+ * AutoBootstrap to off
+ 
+ NON-SEED
+ 
+ * LISTENADDRESS is IP of this node
+ * ThriftAddress is IP of this node (or 0.0.0.0)
+ * SEED is the IP of the earlier seed
+ * AutoBootstrap to on
+ 
+ === Booting up Cassandra ===
+ 
+ After installing the appropriate config file, you boot up Cassandra:
+ 
+ ## WATCH LOG FILES
+ 
+ {{{
+ % tail -f /var/log/cassandra/output.log
+ % tail -f /var/log/cassandra/system.log
+ }}}
+ 
+ ## START CASS
+ 
+ {{{
+ % /etc/init.d/cassandra start
+ }}}
+ 
+ It definitely takes a few minutes for each new node to find all the other nodes.  You can query any node to see what it thinks the cluster is composed of:
+ 
+ {{{
+ % nodetool  -h localhost ring
+ }}}
+