You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by br...@apache.org on 2011/04/29 21:28:07 UTC
svn commit: r1097922 - in /cassandra/branches/cassandra-0.7/contrib/pig:
README.txt example-script.pig
Author: brandonwilliams
Date: Fri Apr 29 19:28:06 2011
New Revision: 1097922
URL: http://svn.apache.org/viewvc?rev=1097922&view=rev
Log:
Update pig example script to work again.
Patch by Jeremy Hanna, reviewed by brandonwilliams for CASSANDRA-2487
Modified:
cassandra/branches/cassandra-0.7/contrib/pig/README.txt
cassandra/branches/cassandra-0.7/contrib/pig/example-script.pig
Modified: cassandra/branches/cassandra-0.7/contrib/pig/README.txt
URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/contrib/pig/README.txt?rev=1097922&r1=1097921&r2=1097922&view=diff
==============================================================================
--- cassandra/branches/cassandra-0.7/contrib/pig/README.txt (original)
+++ cassandra/branches/cassandra-0.7/contrib/pig/README.txt Fri Apr 29 19:28:06 2011
@@ -18,17 +18,22 @@ also set PIG_CONF_DIR to the location of
Finally, set the following as environment variables (uppercase,
underscored), or as Hadoop configuration variables (lowercase, dotted):
-* PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on
* PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to connect to
+* PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on
* PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner
-Run:
+For example, against a local node with the default settings, you'd use:
+export PIG_INITIAL_ADDRESS=localhost
+export PIG_RPC_PORT=9160
+export PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
+
+Then you can build and run it like this:
contrib/pig$ ant
contrib/pig$ bin/pig_cassandra -x local example-script.pig
This will run the test script against your Cassandra instance
-and will assume that there is a Keyspace1/Standard1 with some
+and will assume that there is a MyKeyspace/MyColumnFamily with some
data in it. It will run in local mode (see pig docs for more info).
If you'd like to get to a 'grunt>' shell prompt, run:
@@ -38,24 +43,24 @@ contrib/pig$ bin/pig_cassandra -x local
Once the 'grunt>' shell has loaded, try a simple program like the
following, which will determine the top 50 column names:
-grunt> rows = LOAD 'cassandra://Keyspace1/Standard1' USING CassandraStorage();
-grunt> cols = FOREACH rows GENERATE flatten($1);
+grunt> rows = LOAD 'cassandra://MyKeyspace/MyColumnFamily' USING CassandraStorage() AS (key, columns: bag {T: tuple(name, value)});
+grunt> cols = FOREACH rows GENERATE flatten(columns);
grunt> colnames = FOREACH cols GENERATE $0;
-grunt> namegroups = GROUP colnames BY $0;
+grunt> namegroups = GROUP colnames BY (chararray) $0;
grunt> namecounts = FOREACH namegroups GENERATE COUNT($1), group;
grunt> orderednames = ORDER namecounts BY $0;
grunt> topnames = LIMIT orderednames 50;
grunt> dump topnames;
Slices on columns can also be specified:
-grunt> rows = LOAD 'cassandra://Keyspace1/Standard1&slice_start=C2&slice_end=C4&i&limit=1&reversed=true' USING CassandraStorage();
+grunt> rows = LOAD 'cassandra://MyKeyspace/MyColumnFamily&slice_start=C2&slice_end=C4&i&limit=1&reversed=true' USING CassandraStorage() AS (key, columns: bag {T: tuple(name, value)});
Binary values for slice_start and slice_end can be escaped such as '\u0255'
Outputting to Cassandra requires the same format from input, so the simplest example is:
-grunt> rows = LOAD 'cassandra://Keyspace1/Standard1' USING CassandraStorage();
-grunt> STORE rows into 'cassandra://Keyspace1/Standard2' USING CassandraStorage();
+grunt> rows = LOAD 'cassandra://MyKeyspace/MyColumnFamily' USING CassandraStorage();
+grunt> STORE rows into 'cassandra://MyKeyspace/MyColumnFamily' USING CassandraStorage();
Which will copy the ColumnFamily. Note that the destination ColumnFamily must
already exist for this to work.
Modified: cassandra/branches/cassandra-0.7/contrib/pig/example-script.pig
URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/contrib/pig/example-script.pig?rev=1097922&r1=1097921&r2=1097922&view=diff
==============================================================================
--- cassandra/branches/cassandra-0.7/contrib/pig/example-script.pig (original)
+++ cassandra/branches/cassandra-0.7/contrib/pig/example-script.pig Fri Apr 29 19:28:06 2011
@@ -1,7 +1,7 @@
-rows = LOAD 'cassandra://Keyspace1/Standard1' USING CassandraStorage();
-cols = FOREACH rows GENERATE flatten($1);
+rows = LOAD 'cassandra://MyKeyspace/MyColumnFamily' USING CassandraStorage() AS (key, columns: bag {T: tuple(name, value)});
+cols = FOREACH rows GENERATE flatten(columns);
colnames = FOREACH cols GENERATE $0;
-namegroups = GROUP colnames BY $0;
+namegroups = GROUP colnames BY (chararray) $0;
namecounts = FOREACH namegroups GENERATE COUNT($1), group;
orderednames = ORDER namecounts BY $0;
topnames = LIMIT orderednames 50;