You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Edward Capriolo (JIRA)" <ji...@apache.org> on 2014/02/14 02:42:19 UTC

[jira] [Created] (CASSANDRA-6704) Create wide row scanners

Edward Capriolo created CASSANDRA-6704:
------------------------------------------

             Summary: Create wide row scanners
                 Key: CASSANDRA-6704
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
             Project: Cassandra
          Issue Type: New Feature
            Reporter: Edward Capriolo
            Assignee: Edward Capriolo


The BigTable white paper demonstrates the use of scanners to iterate over rows and columns. http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf

Because Cassandra does not have a primary sorting on row keys scanning over ranges of row keys is less useful. 

However we can use the scanner concept to operate on wide rows. For example many times a user wishes to do some custom processing inside a row and does not wish to carry the data across the network to do this processing. 

I have already implemented thrift methods to compile dynamic groovy code into Filters as well as some code that uses a Filter to page through and process data on the server side.

https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk

The following is a working code snippet.

{code}
    @Test
    public void test_scanner() throws Exception
    {
      ColumnParent cp = new ColumnParent();
      cp.setColumn_family("Standard1");
      ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
      for (char a='a'; a < 'g'; a++){
        Column c1 = new Column();
        c1.setName((a+"").getBytes());
        c1.setValue(new byte [0]);
        c1.setTimestamp(System.nanoTime());
        server.insert(key, cp, c1, ConsistencyLevel.ONE);
      }
      
      FilterDesc d = new FilterDesc();
      d.setSpec("GROOVY_CLASS_LOADER");
      d.setName("limit3");
      d.setCode("import org.apache.cassandra.dht.* \n" +
              "import org.apache.cassandra.thrift.* \n" +
          "public class Limit3 implements SFilter { \n " +
          "public FilterReturn filter(ColumnOrSuperColumn col, List<ColumnOrSuperColumn> filtered) {\n"+
          " filtered.add(col);\n"+
          " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : FilterReturn.FILTER_DONE;\n"+
          "} \n" +
        "}\n");
      server.create_filter(d);
      
      
      ScannerResult res = server.create_scanner("Standard1", "limit3", key, ByteBuffer.wrap("a".getBytes()));
      Assert.assertEquals(3, res.results.size());
    }
{code}

I am going to be working on this code over the next few weeks but I wanted to get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)