You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Matthew F. Dennis (JIRA)" <ji...@apache.org> on 2010/06/04 21:56:57 UTC

[jira] Commented: (CASSANDRA-1046) optimize Memtable.getSliceIterator

    [ https://issues.apache.org/jira/browse/CASSANDRA-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12875727#action_12875727 ] 

Matthew F. Dennis commented on CASSANDRA-1046:
----------------------------------------------

my profiler (not sure I trust it at the moment) showed different things (and at no point was I able to get timeouts in the client, even using numbers an order of magnitude higher than originally reported).

So, I created some scripts to help test this (still didn't get client timeouts - perhaps because of the UUID changes previously made). Inserting prints time UUIDs for the start, ~middle and end of what was inserted. These can be fed into the reader to start from the middle and read the specified number of columns out. I was running these scripts by piping the insertator output to tee uuids and calling the readarator with `cat uuids`.

On my laptop these changes reduced the run time of the scripts from about 2.5 minutes to less than 15 seconds (with reversed slices taking a couple seconds more in total).

In addition, I reviewed the callers of ColumnFamily.getSortedColumns (I did not review any test classes). Everything was already iterating. In particular:

{code}
SSTableExport.SerializeRow already iterates
[avro|thrift].CassandraServer
  .thriftifyColumns already iterates
  .thriftify[Super]Columns already iterates
Migration.getLocalMigrations already iterates
SSTableNameIterator.<init> only creates an iterator for later use
QueryFilter.getRuduced only create an iterator and then calls next()
Table.load already iterates
HintedHandoffManager
  .pagingFinished just calls size
  .deliverHintsToEndpoint already iterates
  .deliverAllHints already iterates
DefsTable.loadFromStorage already iterates
CompactionManager.submitGraveyardCleanup already iterates
ColumnIndexer.seralize already iterates
ColumnFamilySerializer.serializeForSSTable already iterates
ColumnFamily
  .toString already iterates
  .addAll already iterates 
{code}

> optimize Memtable.getSliceIterator
> ----------------------------------
>
>                 Key: CASSANDRA-1046
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1046
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.7
>
>
> As reported by James Golick, about 30% of the time in a read is spent in SliceQueryFilter.getMemColumnIterator, virtually all of which is in ConcurrentSkipListMap$Values.toArrray().
> I wrote on the ML:
> Besides the UUID optimization you posted, we should do an audit of ColumnFamily.getSortedColumns and replace with iteration where possible (in this case, we'd be left with one copy of most of the columns, but that's better than two).
> We can get rid of the other copy by fixing the logic in Memtable.getSliceIterator, which says "copy all the columns, so we can do a binary search on them to find where to start," but since columns are natively in sorted order we could just use an iterator and a while loo

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.