You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Don Smith <ds...@likewise.com> on 2011/10/13 18:39:33 UTC
Re: Efficiency of hector's setRowCount (and setStartKey!)
It's actually setStartKey that's the important method call (in
combination with setRowCount). So I should have been clearer.
The following code performs as expected, as far as returning the
expected data in the expected order. I believe that the use of
IndexedSliceQuery's setStartKey will support efficient queries --
avoiding repulling the entire data set from cassandra. Correct?
void demoPaging() {
String lastKey = processPage("don",""); // get first
batch, starting with "" (smallest key)
lastKey = processPage("don",lastKey); // get second
batch starting with previous last key
lastKey = processPage("don",lastKey); // get third
batch starting with previous last key
//....
}
// return last key processed, null when no records left
String processPage(String username, String startKey) {
String lastKey=null;
IndexedSlicesQuery<String, String, String>
indexedSlicesQuery =
HFactory.createIndexedSlicesQuery(keyspace, stringSerializer,
stringSerializer, stringSerializer);
indexedSlicesQuery.addEqualsExpression("user", username);
indexedSlicesQuery.setColumnNames("source","ip");
indexedSlicesQuery.setColumnFamily(ourColumnFamilyName);
indexedSlicesQuery.setStartKey(startKey); //
<----------------------------------------------------------------------------------------
indexedSlicesQuery.setRowCount(batchSize);
QueryResult<OrderedRows<String, String,
String>> result =indexedSlicesQuery.execute();
OrderedRows<String,String,String> rows
= result.get();
for(Row<String,String,String> row:rows ){
if (row==null) { continue; }
totalCount++;
String key = row.getKey();
if (!startKey.equals(key))
{lastKey=key;}
}
totalCount--;
return lastKey;
}
On 10/13/2011 09:15 AM, Patricio Echagüe wrote:
> Hi Don. No it will not. IndexedSlicesQuery will read just the amount
> of rows specified by RowCount and will go to the DB to get the new
> page when needed.
>
> SetRowCount is doing indexClause.setCount(rowCount);
>
> On Mon, Oct 10, 2011 at 3:52 PM, Don Smith <dsmith@likewise.com
> <ma...@likewise.com>> wrote:
>
> Hector's IndexedSlicesQuery has a setRowCount method that you can
> use to page through the results, as described in
> https://github.com/rantav/hector/wiki/User-Guide .
>
> rangeSlicesQuery.setRowCount(1001);
> .....
> rangeSlicesQuery.setKeys(lastRow.getKey(), "");
>
> Is it efficient? Specifically, suppose my query returns 100,000
> results and I page through batches of 1000 at a time (making 100
> executes of the query). Will it internally retrieve all the
> results each time (but pass only the desired set of 1000 or so to
> me)? Or will it optimize queries to avoid the duplication? I
> presume the latter. :)
>
> Can IndexedSlicesQuery's setStartKey method be used for the same
> effect?
>
> Thanks, Don
>
>
Re: Efficiency of hector's setRowCount (and setStartKey!)
Posted by Patricio Echagüe <pa...@gmail.com>.
On Thu, Oct 13, 2011 at 9:39 AM, Don Smith <ds...@likewise.com> wrote:
> **
> It's actually setStartKey that's the important method call (in combination
> with setRowCount). So I should have been clearer.
>
> The following code performs as expected, as far as returning the expected
> data in the expected order. I believe that the use of IndexedSliceQuery's
> setStartKey will support efficient queries -- avoiding repulling the entire
> data set from cassandra. Correct?
>
correct
>
>
> void demoPaging() {
> String lastKey = processPage("don",""); // get first
> batch, starting with "" (smallest key)
> lastKey = processPage("don",lastKey); // get second
> batch starting with previous last key
> lastKey = processPage("don",lastKey); // get third batch
> starting with previous last key
> //....
> }
>
> // return last key processed, null when no records left
> String processPage(String username, String startKey) {
> String lastKey=null;
> IndexedSlicesQuery<String, String, String>
> indexedSlicesQuery =
> HFactory.createIndexedSlicesQuery(keyspace,
> stringSerializer, stringSerializer, stringSerializer);
>
> indexedSlicesQuery.addEqualsExpression("user", username);
>
> indexedSlicesQuery.setColumnNames("source","ip");
>
> indexedSlicesQuery.setColumnFamily(ourColumnFamilyName);
> indexedSlicesQuery.setStartKey(startKey);
> //
> <----------------------------------------------------------------------------------------
> indexedSlicesQuery.setRowCount(batchSize);
> QueryResult<OrderedRows<String, String,
> String>> result =indexedSlicesQuery.execute();
> OrderedRows<String,String,String> rows =
> result.get();
> for(Row<String,String,String> row:rows ){
> if (row==null) { continue; }
> totalCount++;
> String key = row.getKey();
>
> if (!startKey.equals(key))
> {lastKey=key;}
> }
> totalCount--;
> return lastKey;
> }
>
>
>
>
>
>
> On 10/13/2011 09:15 AM, Patricio Echagüe wrote:
>
> Hi Don. No it will not. IndexedSlicesQuery will read just the amount of
> rows specified by RowCount and will go to the DB to get the new page when
> needed.
>
> SetRowCount is doing indexClause.setCount(rowCount);
>
> On Mon, Oct 10, 2011 at 3:52 PM, Don Smith <ds...@likewise.com> wrote:
>
>> Hector's IndexedSlicesQuery has a setRowCount method that you can use to
>> page through the results, as described in
>> https://github.com/rantav/hector/wiki/User-Guide .
>>
>> rangeSlicesQuery.setRowCount(1001);
>> .....
>> rangeSlicesQuery.setKeys(lastRow.getKey(), "");
>>
>> Is it efficient? Specifically, suppose my query returns 100,000 results
>> and I page through batches of 1000 at a time (making 100 executes of the
>> query). Will it internally retrieve all the results each time (but pass only
>> the desired set of 1000 or so to me)? Or will it optimize queries to avoid
>> the duplication? I presume the latter. :)
>>
>> Can IndexedSlicesQuery's setStartKey method be used for the same effect?
>>
>> Thanks, Don
>>
>
>
>