You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Keith Freeman <8f...@gmail.com> on 2015/07/17 18:29:58 UTC

Java Driver paging slower than manual/token paging?

We've recently started upgrading from 1.2.12 to 2.1.7.  In 1.2.12 we 
wrote code that used the well-known pagination pattern (tokens) to 
process all rows in one of our tables.  For 2.1.7 we tried replacing 
that code with the new built-in pagination code:

>    List<Row> queryRows = new ArrayList<>();
>         String query = "select * from " + schema + "." + table;
>         Statement stmt = new SimpleStatement(query);
>         stmt.setFetchSize(rowLimit);
>         ResultSet rs = session.execute(stmt);
>         for (Row row : rs)
>         {
>             queryRows.add(row);
>             int avail = rs.getAvailableWithoutFetching();
>             if ((!rs.isFullyFetched()) && (avail <= rowLimit - 10))
>             {
>                 rs.fetchMoreResults(); // async
>             }
>
>             if (avail == 0)
>             {
>                 processor.process(queryRows);
>                 queryRows.clear();
>             }
>         }
The schema:
> create table x.messages (
>
> sourceday           text,       // partition-key
> seqnumber           int,        // partition-key
>
> sourcetimeus        bigint,     // clustering-key
> unique              bigint,     // clustering-key
>
> tags                set<text>,
> dc                  text,
> sc                  set<text>,
>
> dn                  text,
> type                text,
> subtype             text,
> das                 int,
>
> ingesttimems        bigint,
> vs                  int,
>
> chunknum            bigint,
>
> humantext           text,
> fields              map<text, text>,
>
> primary key ((sourceday, seqnumber), sourcetimeus, unique)
> )
> with clustering order by (sourcetimeus ASC, unique ASC) and 
> compression = { 'sstable_compression' : 'LZ4Compressor' };

Messages average about 1k in size (most of that in the "fields" map)

In this test, the processor.process() call just prints a progress 
message to sysout.

In a direct comparison reading our test data set (24.1M rows on a single 
node) we see (average of 3 runs each):

  * old paging: 908 seconds, 26k rows/sec
  * new paging: 1044 seconds, 23k rows/sec


Is this appx. ~13% slowdown with the new paging known/expected?  If not, 
how would we diagnose the cause?  We'd definitely prefer to use the new 
paging since the code is MUCH simpler.