You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Keith Freeman <8f...@gmail.com> on 2015/07/17 18:29:58 UTC
Java Driver paging slower than manual/token paging?
We've recently started upgrading from 1.2.12 to 2.1.7. In 1.2.12 we
wrote code that used the well-known pagination pattern (tokens) to
process all rows in one of our tables. For 2.1.7 we tried replacing
that code with the new built-in pagination code:
> List<Row> queryRows = new ArrayList<>();
> String query = "select * from " + schema + "." + table;
> Statement stmt = new SimpleStatement(query);
> stmt.setFetchSize(rowLimit);
> ResultSet rs = session.execute(stmt);
> for (Row row : rs)
> {
> queryRows.add(row);
> int avail = rs.getAvailableWithoutFetching();
> if ((!rs.isFullyFetched()) && (avail <= rowLimit - 10))
> {
> rs.fetchMoreResults(); // async
> }
>
> if (avail == 0)
> {
> processor.process(queryRows);
> queryRows.clear();
> }
> }
The schema:
> create table x.messages (
>
> sourceday text, // partition-key
> seqnumber int, // partition-key
>
> sourcetimeus bigint, // clustering-key
> unique bigint, // clustering-key
>
> tags set<text>,
> dc text,
> sc set<text>,
>
> dn text,
> type text,
> subtype text,
> das int,
>
> ingesttimems bigint,
> vs int,
>
> chunknum bigint,
>
> humantext text,
> fields map<text, text>,
>
> primary key ((sourceday, seqnumber), sourcetimeus, unique)
> )
> with clustering order by (sourcetimeus ASC, unique ASC) and
> compression = { 'sstable_compression' : 'LZ4Compressor' };
Messages average about 1k in size (most of that in the "fields" map)
In this test, the processor.process() call just prints a progress
message to sysout.
In a direct comparison reading our test data set (24.1M rows on a single
node) we see (average of 3 runs each):
* old paging: 908 seconds, 26k rows/sec
* new paging: 1044 seconds, 23k rows/sec
Is this appx. ~13% slowdown with the new paging known/expected? If not,
how would we diagnose the cause? We'd definitely prefer to use the new
paging since the code is MUCH simpler.