You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by Apache Wiki <wi...@apache.org> on 2013/08/30 05:09:29 UTC

[Cassandra Wiki] Update of "GettingStarted" by JonathanEllis

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "GettingStarted" page has been changed by JonathanEllis:
https://wiki.apache.org/cassandra/GettingStarted?action=diff&rev1=91&rev2=92

Comment:
update for cqlsh

  == Cassandra documentation from DataStax ==
  !DataStax's latest [[http://www.datastax.com/docs/1.2/index|Cassandra documentation]] covers topics from installation to troubleshooting, including a [[http://www.datastax.com/docs/quick_start/quickstart|Quick Start Guide]].  Documentation for older releases is also available.
-  
+ 
  == Introduction ==
+ This document aims to provide a few easy to follow steps to take the first-time user from installation, to running single node Cassandra, and overview to configure multinode cluster. Cassandra is meant to run on a cluster of nodes, but will run equally well on a single machine. This is a handy way of getting familiar with the software while avoiding the complexities of a larger system.
+ 
- This document aims to provide a few easy to follow steps to take the first-time user from installation, to running single node Cassandra, and overview to configure multinode cluster.
- Cassandra is meant to run on a cluster of nodes, but will run equally well on a single machine. This is a handy way of getting familiar with the software while avoiding the complexities of a larger system.
-   
  == Step 0: Prerequisites and Connecting to the Community ==
  Cassandra requires the most stable version of Java 1.6 you can deploy, preferably the Oracle/Sun JVM.  Cassandra also runs on the IBM JVM, and should run on jrockit as well.
  
-  Note for OS X users:
+  . Note for OS X users:
   Some people running OS X have trouble getting Java 6 to work. If you've kept up with Apple's updates, Java 6 should already be installed (it comes in Mac OS X 10.5  Update 1). Unfortunately, Apple does not default to using it. What you have to do is change your `JAVA_HOME` environment setting to `/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home` and add `/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin` to the beginning of your `PATH`.
-  
+ 
  The best way to ensure you always have up to date information on the project, releases, stability, bugs, and features is to subscribe to the users mailing list ([[mailto:user-subscribe@cassandra.apache.org|subscription required]]) and participate in the #cassandra channel on [[http://webchat.freenode.net/?channels=#cassandra|IRC]].
+ 
+ <<Anchor(picking_a_version)>> <<Anchor(download_a_kit)>>
+ 
-  
- <<Anchor(picking_a_version)>>
- <<Anchor(download_a_kit)>>
-  
  == Step 1: Download Cassandra ==
-  
   * Download links for the latest stable release can always be found on the [[http://cassandra.apache.org/download|website]].
   * Users of Debian or Debian-based derivatives can install the latest stable release in package form, see DebianPackaging for details.
-  * Users of RPM-based distributions can get packages from [[http://www.datastax.com/docs/1.1/install/install_rpm|Datastax]].
+  * Users of RPM-based distributions can get packages from [[http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/install/installRHEL_t.html|Datastax]].
   * If you are interested in building Cassandra from source, please refer to [[HowToBuild|How to Build]] page.
-  
+ 
  For more details about misc builds, please refer to [[VersionsAndBuilds|Cassandra versions and builds]] page.
-  
+ 
  <<Anchor(running_a_single_node)>>
-  
+ 
  == Step 2: Basic Configuration ==
+ The Cassandra configuration files can be found in the `conf` directory of binary and source distributions. If you have installed Cassandra from a deb or rpm package, the configuration files will be located in `/etc/cassandra`.
+ 
-  
- The Cassandra configuration files can be found in the `conf` directory of binary and source distributions.
- If you have installed Cassandra from a deb or rpm package, the configuration files will be located in `/etc/cassandra`.
-  
  === Step 2.1: Directories Used by Cassandra ===
+ If you've installed Cassandra with a deb or rpm package, the directories that Cassandra will use should already be created an have the correct permissions. Otherwise, you will want to check the following config settings from `conf/cassandra.yaml`: `data_file_directories` (`/var/lib/cassandra/data`), `commitlog_directory` (`/var/lib/cassandra/commitlog`), and `saved_caches_directory` (`/var/lib/cassandra/saved_caches`).  Make sure these directories exist and can be written to.
- If you've installed Cassandra with a deb or rpm package, the directories that Cassandra will use should already be created an have the correct permissions. Otherwise, you will want to check the following config settings.
- 
- In `conf/cassandra.yaml` you will find the following configuration options: `data_file_directories` (`/var/lib/cassandra/data`), `commitlog_directory` (`/var/lib/cassandra/commitlog`), and `saved_caches_directory` (`/var/lib/cassandra/saved_caches`).  Make sure these directories exist and can be written to.
  
  By default, Cassandra will write its logs in `/var/log/cassandra/`.  Make sure this directory exists and is writeable, or change this line in `conf/log4j-server.properies`:
+ 
  {{{
  log4j.appender.R.File=/var/log/cassandra/system.log
  }}}
+ JVM-level settings such as heap size can be set in `conf/cassandra-env.sh`.
-  
- === Step 2.2: Configure Memory Usage (Optional) ===
- By default, Cassandra will allocate memory based on physical memory your system has, using somewhere between 1/4 and 1/2 of the available RAM. 
- 
- If you want to specify how much memory Cassandra should use explicitly, edit `conf/cassandra-env.sh`, find the following lines, uncomment them, and change their values:
- {{{
- #MAX_HEAP_SIZE="4G"
- #HEAP_NEWSIZE="800M"
- }}}
- For `MAX_HEAP_SIZE` use as little as you can get away with.  It's recommended to stay within 8G because much beyond that, the CMS GC pauses interfere with normal operations.
- For `HEAP_NEWSIZE` use the number of cores * 100 but don't exceed 800M.  With too much allocated, ParNew GC pauses become detrimental.
- 
  
  == Step 3: Start Cassandra ==
  And now for the moment of truth, start up Cassandra by invoking '`bin/cassandra -f`' from the command line<<FootNote(To learn more about controlling the behavior of startup scripts, see RunningCassandra.)>>. The service should start in the foreground and log gratuitously to the console. Assuming you don't see messages with scary words like "error", or "fatal", or anything that looks like a Java stack trace, then everything should be working.
@@ -63, +46 @@

  
  If you start up Cassandra without the "-f" option, it will run in the background. You can stop the process by killing it, using '`pkill -f CassandraDaemon`', for example.
  
- == Step 4: Using cassandra-cli ==
+  . Users of recent Linux distributions and Mac OS X Snow Leopard should be able to start up Cassandra simply by untarring and invoking `bin/cassandra -f` with root privileges. Snow Leopard ships with Java 1.6.0 and does not require changing the `JAVA_HOME` environment variable or adding any directory to your `PATH`. On Linux just make sure you have a working Java JDK package installed such as the `openjdk-6-jdk` on Ubuntu Lucid Lynx.
  
+ == Step 4: Using cqlsh ==
+ `bin/cqlsh` is an interactive command line interface for Cassandra. You can define the schema and interact with data using it. Run the following command to connect to your local Cassandra instance:
- `bin/cassandra-cli` is an interactive command line interface for Cassandra. You can alter the schema and interact with data using the cli.
- Run the following command to connect to your local Cassandra instance:
- {{{
- bin/cassandra-cli
- }}}
- 
- You should see the following prompt, if successful:
- {{{
- Connected to: "Test Cluster" on 127.0.0.1/9160
- Welcome to Cassandra CLI version 1.0.7
- 
- Type 'help;' or '?' for help.
- Type 'quit;' or 'exit;' to quit.
- 
- [default@unknown] 
- }}}
- 
- You can access to the online help with 'help;' command. Commands are terminated with a semicolon (';') in the cli.
  
  {{{
- [default@unknown] help;
+ $ bin/cqlsh
  }}}
+ You should see the following prompt, if successful:
- 
- First, create a keyspace for your test.
  
  {{{
+ Connected to Test Cluster at localhost:9160.
+ [cqlsh 2.3.0 | Cassandra 1.2.2 | CQL spec 3.0.0 | Thrift protocol 19.35.0]
+ Use HELP for help.
- [default@unknown] create keyspace DEMO
-     with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
-     and strategy_options = {replication_factor:1};  
- f53dff10-5bd8-11e1-0000-915a024292eb
- Waiting for schema agreement...
- ... schemas agree across the cluster
- [default@unknown] 
  }}}
+ For clarity, we will omit the cqlsh prompt in the following examples.
  
- Don't forget to add a semicolon (';') at end of the command.
+ You can access the online help with 'help;' command. Commands are terminated with a semicolon (';') in cqlsh.
  
+ First, create a keyspace -- a namespace of tables.
- Second, authenticate to the DEMO keyspace:
- {{{
- [default@unknown] use DEMO;
- Authenticated to keyspace: DEMO
- [default@DEMO]
- }}}
- 
- Third, create a `Users` column family:
- {{{
- [default@DEMO] create column family Users                
- ...	with key_validation_class = 'UTF8Type'    
- ...	and comparator = 'UTF8Type'               
- ...	and default_validation_class = 'UTF8Type';
- [default@DEMO]
- }}}
- 
- Now you can store data into `Users` column family:
  
  {{{
+ CREATE KEYSPACE mykeyspace
+ WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
- [default@DEMO] set Users[1234][name] = scott;
- Value inserted.
- Elapsed time: 10 msec(s).
- [default@DEMO] set Users[1234][password] = tiger;
- Value inserted.
- Elapsed time: 10 msec(s).
- [default@DEMO]
  }}}
+ Second, authenticate to the new keyspace:
  
- You have inserted a row into the `Users` column family. The row key is '1234', and we set values for two columns in the row: 'name', and 'password'.
+ {{{
+ USE mykeyspace;
+ }}}
+ Third, create a `users` table:
  
+ {{{
+ CREATE TABLE users (
+   user_id int PRIMARY KEY,
+   fname text,
+   lname text
+ );
+ }}}
+ Now you can store data into `users`:
+ 
+ {{{
+ INSERT INTO users (user_id,  fname, lname)
+   VALUES (1745, 'john', 'smith');
+ INSERT INTO users (user_id,  fname, lname)
+   VALUES (1744, 'john', 'doe');
+ INSERT INTO users (user_id,  fname, lname)
+   VALUES (1746, 'john', 'smith');
+ }}}
  Now let's fetch the data you inserted:
+ 
  {{{
- [default@DEMO] get Users[1234];
- => (column=name, value=scott, timestamp=1350769161684000)
- => (column=password, value=tiger, timestamp=1350769245191000)
+ SELECT * FROM users;
+ }}}
+ You should see output reflecting your new rows:
  
- Returned 2 results.
- Elapsed time: 67 msec(s).
- [default@DEMO]
+ {{{
+  user_id | fname | lname
+ ---------+-------+-------
+     1745 |  john | smith
+     1744 |  john |   doe
+     1746 |  john | smith
  }}}
+ You can retrieve data about users whose last name is smith by creating an index, then querying the table as follows:
  
+ {{{
+ CREATE INDEX ON users (lname);
  
- You can easily specify types other than UTF-8 when creating or updating a column family. See '`help update column family;`' and '`help create column family;`' for more details.
+ SELECT * FROM users WHERE lname = 'smith';
  
- To be certain though, take some time to try out the examples in CassandraCli before moving on
- Also, if you run into problems, Don't Panic, calmly proceed to [[#if_something_goes_wrong|If Something Goes Wrong]].
-  
-  Users of recent Linux distributions and Mac OS X Snow Leopard should be able to start up Cassandra simply by untarring and invoking `bin/cassandra -f` with root privileges. Snow Leopard ships with Java 1.6.0 and does not require changing the `JAVA_HOME` environment variable or adding any directory to your `PATH`. On Linux just make sure you have a working Java JDK package installed such as the `openjdk-6-jdk` on Ubuntu Lucid Lynx.
- 
+  user_id | fname | lname
+ ---------+-------+-------
+     1745 |  john | smith
+     1746 |  john | smith
+ }}}
- == Configuring Multinode Cluster ==
+ == Configuring Multinode Clusters ==
- 
  Now you have single working Cassandra node. It is a Cassandra cluster which has only one node. By adding more nodes, you can make it a multi node cluster.
  
  Setting up a Cassandra cluster is ''almost'' as simple as repeating the above procedures  for each node in your cluster. There are a few minor exceptions though.
-  
+ 
  Cassandra nodes exchange information about one another using a mechanism called Gossip, but to get the ball rolling a newly started node needs to know of at least one other, this is called a '''Seed'''. It's customary to pick a small number of relatively stable nodes to serve as your seeds, but there is no hard-and-fast rule here. Do make sure that each seed also knows of at least one other, remember, the goal is to avoid a chicken-and-egg scenario and provide an avenue for all nodes in the cluster to discover one another.
-  
- In addition to seeds, you'll also need to configure the IP interface to listen on for Gossip and Thrift, ('''listen_address''' and '''rpc_address''' respectively). Use a 'listen_address` that will be reachable from the `listen_address` used on all other nodes, and a `rpc_address` that will be accessible to clients.
-  
- One other thing you need to care at multi node cluster is '''Token'''. Each node in the cluster owns a part of token range  from 0 to 2^127-1. 
- If the Nth node in the cluster has token value T(N), the node owns range from T(N-1)+1 to T(N).  Cassandra decide nodes where a data should be stored based on the consistent mapping of the row key and token range (refer to RandomPartitioner, ByteOrderedPartitioner). 
  
+ In addition to seeds, you'll also need to configure the IP interface to listen on for Gossip and CQL, ('''listen_address''' and '''rpc_address''' respectively). Use a 'listen_address` that will be reachable from the `listen_address` used on all other nodes, and a `rpc_address` that will be accessible to clients.
- The token can be assigned to node by '''initial_token''' parameter in cassandra.yaml. The parameter is effective only at the first boot of the node. Once you boot a node, use 'nodetool move' command to change the assigned token.  You need to specify appropriate initial_token for each node to balance data load across the nodes.  Here is a python script to calculate balanced tokens.
- {{{
- # Number of nodes in the cluster
- num_node = 4
- 
- for n in range(num_node):
-     print int(2**127 / num_node * n)
- }}}
  
  Once everything is configured and the nodes are running, use the `bin/nodetool ring` utility to verify a properly connected cluster. For example:
-  
+ 
  {{{
- eevans@achilles:‾$ bin/nodetool -host 192.168.0.10 -p 7199 ring
+ eevans@achilles:‾$ bin/nodetool -host 192.168.0.10 -p 7199 status
- Address         DC      Rack    Status State   Load        Owns    Token                                       
-                                                                    127605887595351923798765477786913079296     
- 192.168.0.10    DC1     r1      Up     Normal  17.3 MB     25.00%  0                                           
- 192.168.0.11    DC1     r1      Up     Normal  17.4 MB     25.00%  42535295865117307932921825928971026432      
- 192.168.0.12    DC1     r1      Up     Normal  37.2 MB     25.00%  85070591730234615865843651857942052864      
- 192.168.0.13    DC1     r1      Up     Normal  24.55 MB    25.00%  127605887595351923798765477786913079296     
+ Datacenter: datacenter1
+ =======================
+ Status=Up/Down
+ |/ State=Normal/Leaving/Joining/Moving
+ --  Address    Load       Tokens  Owns   Host ID                               Rack
+ UN  127.0.0.3  30.99 KB   256     32.4%  92b20e08-9ddd-4f55-9173-8516e74d27f5  rack1
+ UN  127.0.0.2  31 KB      256     31.5%  b9616658-c744-48fb-b64f-83f96b007d93  rack1
+ UN  127.0.0.1  30.96 KB   256     36.1%  f7a08973-85bd-460f-8176-d6f9df8c23f4  rack1
  }}}
  Advanced cluster management is described in [[Operations]].
-  
- If you don't yet have access to hardware for a Cassandra cluster you can try it out on EC2 with CloudConfig.
  
+ If you don't yet have access to hardware for a real Cassandra cluster, you can manage local clusters easily with [[https://github.com/pcmanus/ccm|ccm]] (Cassandra Cluster Manager).
+ 
- For more details about configuring multi node cluster, please refer to [[MultinodeCluster]].
+ For more details about configuring multi node cluster, please refer to MultinodeCluster.
-  
+ 
  == Write your application ==
+ Review the resources on DataModeling.  The full CQL documentation is [[http://www.datastax.com/documentation/cql/3.0/webhelp/index.html|here]].
+ 
+ DataStax sponsors development of the CQL drivers at https://github.com/datastax.  The full list of CQL drivers is on the ClientOptions page.
+ 
- The recommended way to communicate with Cassandra in your application is to use a [[http://wiki.apache.org/cassandra/ClientOptions|higher-level client]]. These provide programming language specific API:s for talking to Cassandra in a variety of languages. The details will vary depending on programming language and client, but in general using a higher-level client will mean that you have to write less code and get several features for free that you would otherwise have to write yourself.
-  
- That said, it is useful to know that Cassandra uses [[http://thrift.apache.org/|Thrift]] for its external client-facing API. Cassandra's main API/RPC/Thrift port is 9160. Thrift supports a [[http://svn.apache.org/viewvc/thrift/trunk/lib/|wide variety of languages]] so you can code your application to use Thrift directly if you so chose (but again we recommend a [[http://wiki.apache.org/cassandra/ClientOptions|high-level client]] where available).
-  
- Important note: If you intend to use thrift directly, you need to install a version of thrift that matches the revision that your version of Cassandra uses. InstallThrift
-  
- Cassandra's main API/RPC/Thrift port is 9160 by default, which is defined as rpc_port in cassandra.yaml. It is a common mistake for API clients to connect to the JMX port instead.
-  
- Checking out a demo application like [[http://github.com/twissandra/twissandra|Twissandra]] (Python + Django) will also be useful.
-  
  <<Anchor(if_something_goes_wrong)>>
-  
+ 
  == If Something Goes Wrong ==
  If you followed the steps in this guide and failed to get up and running, we'd love to help. Here's what we need.
-  
+ 
   1. If you are running anything other than a stable release, please upgrade first and see if you can still reproduce the problem.
   1. Make sure debug logging is enabled (hint: `conf/log4j.properties`) and save a copy of the output.
   1. Search the [[http://news.gmane.org/gmane.comp.db.cassandra.user|mailing list archive]] and see if anyone has reported a similar problem and what, if any resolution they received.
   1. Ditto for the [[https://issues.apache.org/jira/browse/CASSANDRA|bug tracking system]].
   1. See if you can put together a unit test, script, or application that reproduces the problem.
-  
+ 
  Finally, post a message with all relevant details to the list ([[mailto:user-subscribe@cassandra.apache.org|subscription required]]), or hop onto [[http://webchat.freenode.net/?channels=#cassandra|IRC]] (network irc.freenode.net, channel #cassandra) and let us know.
-  
+ 
  <<BR>> <<BR>>
   
  ----