You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Yang <te...@gmail.com> on 2011/06/29 20:34:31 UTC
Re: custom reconciling columns? (improve performance of long rows )

I hacked around the code, and first I thought that the cost on map put and
get was due to the synchronization cost , so I tried
replacing concurrentSkipListMap with TreeMap. I created a subclass of
ColumnFamily and use the subclass only in pure read path : interestingly
on the read path, no more than one thread accesses the return CF at any
time, so we can remove the concurrency control.
but it did not offer any significant change in speed.

then I tried changing TreeMap to HashMap, this time, it uses only half the
time. but the problem is how to keep the sorted output. doing a sort on
every return is going to be even slower...




On Tue, Jun 28, 2011 at 10:07 PM, Yang <te...@gmail.com> wrote:

> btw I use only one box now just because I'm running it on dev junit test,
> not that it's going to be that way in production....
>
>
> On Tue, Jun 28, 2011 at 10:06 PM, Yang <te...@gmail.com> wrote:
>
>> ok, here is the profiling result. I think this is consistent (having been
>> trying to recover how to effectively use yourkit ...)  see attached picture
>>
>> since I actually do not use the thrift interface, but just directly use
>> the thrift.CassandraServer and run my code in the same JVM as cassandra,
>> and was running the whole thing on a single box, there is no message
>> serialization/deserialization cost. but more columns did add on to more
>> time.
>>
>> the time was spent in the ConcurrentSkipListMap operations that implement
>> the memtable.
>>
>>
>> regarding breaking up the row, I'm not sure it would reduce my run time,
>> since our requirement is to read the entire rolling window history (we
>> already have
>> the TTL enabled , so the history is limited to a certain length, but it is
>> quite long: over 1000 , in some  cases, can be 5000 or more ) .  I think
>> accessing roughly 1000 items is not an uncommon requirement for many
>> applications. in our case, each column has about 30 bytes of data, besides
>> the meta data such as ttl, timestamp.
>> at history length of 3000, the read takes about 12ms (remember this is
>> completely in-memory, no disk access)
>>
>> I just took a look at the expiring column logic, it looks that the
>> expiration does not come into play until when the
>> CassandraServer.internal_get()===>thriftifyColumns() gets called. so the
>> above memtable access time is still spent. yes, then breaking up the row is
>> going to be helpful, but only to the degree of preventing accessing
>> expired columns (btw ---- if this is actually built into cassandra code it
>> would be nicer, so instead of spending multiple key lookups, I locate to the
>> row once, and then within the row, there are different "generation" buckets,
>> so those old generation buckets that are beyond expiration are not read );
>> currently just accessing the 3000 live columns is already quite slow.
>>
>> I'm trying to see whether there are some easy magic bullets for a drop-in
>> replacement for concurrentSkipListMap...
>>
>> Yang
>>
>>
>>
>>
>> On Tue, Jun 28, 2011 at 4:18 PM, Nate McCall <na...@datastax.com> wrote:
>>
>>> I agree with Aaron's suggestion on data model and query here. Since
>>> there is a time component, you can split the row on a fixed duration
>>> for a given user, so the row key would become userId_[timestamp
>>> rounded to day].
>>>
>>> This provides you an easy way to roll up the information for the date
>>> ranges you need since the key suffix can be created without a read.
>>> This also benefits from spreading the read load over the cluster
>>> instead of just the replicas since you have 30 rows in this case
>>> instead of one.
>>>
>>> On Tue, Jun 28, 2011 at 5:55 PM, aaron morton <aa...@thelastpickle.com>
>>> wrote:
>>> > Can you provide some more info:
>>> > - how big are the rows, e.g. number of columns and column size  ?
>>> > - how much data are you asking for ?
>>> > - what sort of read query are you using ?
>>> > - what sort of numbers are you seeing ?
>>> > - are you deleting columns or using TTL ?
>>> > I would consider issues with the data churn, data model and query
>>> before
>>> > looking at serialisation.
>>> > Cheers
>>> > -----------------
>>> > Aaron Morton
>>> > Freelance Cassandra Developer
>>> > @aaronmorton
>>> > http://www.thelastpickle.com
>>> > On 29 Jun 2011, at 10:37, Yang wrote:
>>> >
>>> > I can see that as my user history grows, the reads time proportionally
>>> ( or
>>> > faster than linear) grows.
>>> > if my business requirements ask me to keep a month's history for each
>>> user,
>>> > it could become too slow.----- I was suspecting that it's actually the
>>> > serializing and deserializing that's taking time (I can definitely it's
>>> cpu
>>> > bound)
>>> >
>>> >
>>> > On Tue, Jun 28, 2011 at 3:04 PM, aaron morton <aaron@thelastpickle.com
>>> >
>>> > wrote:
>>> >>
>>> >> There is no facility to do custom reconciliation for a column. An
>>> append
>>> >> style operation would run into many of the same problems as the
>>> Counter
>>> >> type, e.g. not every node may get an append and there is a chance for
>>> lost
>>> >> appends unless you go to all the trouble Counter's do.
>>> >>
>>> >> I would go with using a row for the user and columns for each item.
>>> Then
>>> >> you can have fast no look writes.
>>> >>
>>> >> What problems are you seeing with the reads ?
>>> >>
>>> >> Cheers
>>> >>
>>> >>
>>> >> -----------------
>>> >> Aaron Morton
>>> >> Freelance Cassandra Developer
>>> >> @aaronmorton
>>> >> http://www.thelastpickle.com
>>> >>
>>> >> On 29 Jun 2011, at 04:20, Yang wrote:
>>> >>
>>> >> > for example, if I have an application that needs to read off a user
>>> >> > browsing history, and I model the user ID as the key,
>>> >> > and the history data within the row. with current approach, I could
>>> >> > model each visit as  a column,
>>> >> > the possible issue is that *possibly* (I'm still doing a lot of
>>> >> > profiling on this to verify) that a lot of time is spent on
>>> serialization
>>> >> > into the message and out of the
>>> >> > message, plus I do not need the full features provided by the column
>>> :
>>> >> > for example I do not need a timestamp on each visit, etc,
>>> >> > so it might be faster to put the entire history in a blob, and each
>>> >> > visit only takes up a few bytes in the blob, and
>>> >> > my code manipulates the blob.
>>> >> >
>>> >> > problem is, I still need to avoid the read-before-write, so I send
>>> only
>>> >> > the latest visit, and let cassandra do the reconcile, which appends
>>> the
>>> >> > visit to the blob, so this needs custom reconcile behavior.
>>> >> >
>>> >> > is there a way to incorporate such custom reconcile under current
>>> code
>>> >> > framework? (I see custom sorting, but no custom reconcile)
>>> >> >
>>> >> > thanks
>>> >> > yang
>>> >>
>>> >
>>> >
>>> >
>>>
>>
>>
>