You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Don Smith <ds...@likewise.com> on 2011/10/13 18:39:33 UTC

Re: Efficiency of hector's setRowCount (and setStartKey!)

It's actually setStartKey that's the important method call (in 
combination with setRowCount). So I should have been clearer.

The following code performs as expected, as far as returning the 
expected data in the expected order.  I believe that the use of 
IndexedSliceQuery's setStartKey will support efficient queries -- 
avoiding repulling the entire data set from cassandra. Correct?


         void demoPaging() {
                 String lastKey = processPage("don","");  // get first 
batch, starting with "" (smallest key)
                 lastKey = processPage("don",lastKey);    // get second 
batch starting with previous last key
                 lastKey = processPage("don",lastKey);    // get third 
batch starting with previous last key
                //....
         }

         // return last key processed, null when no records left
         String processPage(String username, String startKey) {
                 String lastKey=null;
                 IndexedSlicesQuery<String, String, String> 
indexedSlicesQuery =
                                 
HFactory.createIndexedSlicesQuery(keyspace, stringSerializer, 
stringSerializer, stringSerializer);
                                 
indexedSlicesQuery.addEqualsExpression("user", username);
                                 
indexedSlicesQuery.setColumnNames("source","ip");
                                 
indexedSlicesQuery.setColumnFamily(ourColumnFamilyName);
                                 
indexedSlicesQuery.setStartKey(startKey);   // 
<----------------------------------------------------------------------------------------
                                 indexedSlicesQuery.setRowCount(batchSize);
                                 QueryResult<OrderedRows<String, String, 
String>> result =indexedSlicesQuery.execute();
                                 OrderedRows<String,String,String> rows 
= result.get();
                                 for(Row<String,String,String> row:rows ){
                                         if (row==null) { continue; }
                                         totalCount++;
                                         String key = row.getKey();

                                         if (!startKey.equals(key)) 
{lastKey=key;}
                                 }
                                 totalCount--;
                                 return lastKey;
         }






On 10/13/2011 09:15 AM, Patricio Echagüe wrote:
> Hi Don. No it will not. IndexedSlicesQuery will read just the amount 
> of rows specified by RowCount and will go to the DB to get the new 
> page when needed.
>
> SetRowCount is doing indexClause.setCount(rowCount);
>
> On Mon, Oct 10, 2011 at 3:52 PM, Don Smith <dsmith@likewise.com 
> <ma...@likewise.com>> wrote:
>
>     Hector's IndexedSlicesQuery has a setRowCount method that you can
>     use to page through the results, as described in
>     https://github.com/rantav/hector/wiki/User-Guide .
>
>         rangeSlicesQuery.setRowCount(1001);
>          .....
>         rangeSlicesQuery.setKeys(lastRow.getKey(),  "");
>
>     Is it efficient?  Specifically, suppose my query returns 100,000
>     results and I page through batches of 1000 at a time (making 100
>     executes of the query). Will it internally retrieve all the
>     results each time (but pass only the desired set of 1000 or so to
>     me)? Or will it optimize queries to avoid the duplication?      I
>     presume the latter. :)
>
>     Can IndexedSlicesQuery's setStartKey method be used for the same
>     effect?
>
>       Thanks,  Don
>
>


Re: Efficiency of hector's setRowCount (and setStartKey!)

Posted by Patricio Echagüe <pa...@gmail.com>.
On Thu, Oct 13, 2011 at 9:39 AM, Don Smith <ds...@likewise.com> wrote:

> **
> It's actually setStartKey that's the important method call (in combination
> with setRowCount). So I should have been clearer.
>
> The following code performs as expected, as far as returning the expected
> data in the expected order.  I believe that the use of IndexedSliceQuery's
> setStartKey will support efficient queries -- avoiding repulling the entire
> data set from cassandra. Correct?
>

correct

>
>
>         void demoPaging() {
>                 String lastKey = processPage("don","");  // get first
> batch, starting with "" (smallest key)
>                 lastKey = processPage("don",lastKey);    // get second
> batch starting with previous last key
>                 lastKey = processPage("don",lastKey);    // get third batch
> starting with previous last key
>                //....
>         }
>
>         // return last key processed, null when no records left
>         String processPage(String username, String startKey) {
>                 String lastKey=null;
>                 IndexedSlicesQuery<String, String, String>
> indexedSlicesQuery =
>                                 HFactory.createIndexedSlicesQuery(keyspace,
> stringSerializer, stringSerializer, stringSerializer);
>
> indexedSlicesQuery.addEqualsExpression("user", username);
>
> indexedSlicesQuery.setColumnNames("source","ip");
>
> indexedSlicesQuery.setColumnFamily(ourColumnFamilyName);
>                                 indexedSlicesQuery.setStartKey(startKey);
> //
> <----------------------------------------------------------------------------------------
>                                 indexedSlicesQuery.setRowCount(batchSize);
>                                 QueryResult<OrderedRows<String, String,
> String>> result =indexedSlicesQuery.execute();
>                                 OrderedRows<String,String,String> rows =
> result.get();
>                                 for(Row<String,String,String> row:rows ){
>                                         if (row==null) { continue; }
>                                         totalCount++;
>                                         String key = row.getKey();
>
>                                         if (!startKey.equals(key))
> {lastKey=key;}
>                                 }
>                                 totalCount--;
>                                 return lastKey;
>         }
>
>
>
>
>
>
> On 10/13/2011 09:15 AM, Patricio Echagüe wrote:
>
> Hi Don. No it will not. IndexedSlicesQuery will read just the amount of
> rows specified by RowCount and will go to the DB to get the new page when
> needed.
>
>  SetRowCount is doing indexClause.setCount(rowCount);
>
> On Mon, Oct 10, 2011 at 3:52 PM, Don Smith <ds...@likewise.com> wrote:
>
>> Hector's IndexedSlicesQuery has a setRowCount method that you can use to
>> page through the results, as described in
>> https://github.com/rantav/hector/wiki/User-Guide .
>>
>>     rangeSlicesQuery.setRowCount(1001);
>>      .....
>>     rangeSlicesQuery.setKeys(lastRow.getKey(),  "");
>>
>> Is it efficient?  Specifically, suppose my query returns 100,000 results
>> and I page through batches of 1000 at a time (making 100 executes of the
>> query). Will it internally retrieve all the results each time (but pass only
>> the desired set of 1000 or so to me)? Or will it optimize queries to avoid
>> the duplication?      I presume the latter. :)
>>
>> Can IndexedSlicesQuery's setStartKey method be used for the same effect?
>>
>>   Thanks,  Don
>>
>
>
>