You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Freeman, Tim" <ti...@hp.com> on 2009/12/06 19:09:46 UTC

MySQL has same read latency issue; short benchmarks; slow writes are better than uncompacted data (was RE: Persistently increasing read latency)

I gave MySQL 5.1.38 a try (that's not the clustered version, so nevermind what happens if a disk is lost) and I saw the same persistently increasing latency that I saw with Cassandra.

I also tried storing into files on the local filesystem (not clustered or transactional, so nevermind what happens if a node fails permanently or temporarily).  It apparently came to an equilibrium after 47000 seconds, or around 13 hours.  It's still running so I'll watch it some more before really believing that the latency won't increase more.

It looks like I'll have to let benchmarks run for a day to determine whether I have true long-term performance numbers.  Yuck.

See the attached chart comparing them.  The vertical scale is milliseconds of latency per read, and the horizontal scale is seconds.  We're reading and writing 350K records, 100Kb each, around 650 reads per minute and 780 writes per minute on one 4 core Z600 with one disk drive.  The "write 2 files" part is reminding me of a performance bug when writing to the filesystem.
 
Here's where I come into disagreement with the following:

From: Jonathan Ellis [mailto:jbellis@gmail.com] 
>I agree that [bounding the compaction backlog is] much much nicer 
>in the sense that it makes it more
>obvious what the problem is (not enough capacity) but it only helps
>diagnosis, not mitigation.

The next thing I would try is to use the local filesystem as a cache in front of some distributed database, perhaps Cassandra.  This is a use case where we could win if we had slow writes but will lose if there's an unbounded commit backlog with slow reads.  This is an attempt to solve a real problem, not something that's contrived to win a debate.

Suppose I use the local filesystem as a cache in front of Cassandra.  The application would write to the local filesystem and read from the local filesystem.  We have a background task that persists records from the local filesystem to Cassandra.  When reading, if the data isn't on the local filesystem, we get it from Cassandra.  The read load to Cassandra is eliminated, except when a replacement node is starting up.  The write load to Cassandra can be as low as we want because we can update the file on the local filesystem several times while persisting it to Cassandra only once.  There are details omitted here; the description here has the same performance (I hope) as the real idea but more bugs.  Nevermind the bugs for now.

If Cassandra writes things quickly, then we have fresh data in Cassandra, so we have to redo relatively little work when a node fails because less data was lost.

If Cassandra is slow to write but has bounded read time, then we have more stale data in Cassandra.  When a node is replaced, it can read a stale version of the data on the original node, and we have to redo more work when a node fails.  No big deal.

If Cassandra allows fast writes but is accumulating an unbounded backlog of uncompacted files, then I'm in trouble.  The unbounded backlog either fills up disks, or it leads to reads taking forever and when it comes time to recover from a node failing we can't really read the data that was persisted when the node was up.  Or perhaps I throttle writes to Cassandra based on guesses about how fast it can safely go.  The problem with that is that different nodes can probably go different speeds, and any throttling code I write would be outside of Cassandra so it would have to throttle based on the slowest node and discovering that would be awkward.  I don't know yet what ratio in speed to expect between nodes.

Tim Freeman
Email: tim.freeman@hp.com
Desk in Palo Alto: (650) 857-2581
Home: (408) 774-1298
Cell: (408) 348-7536 (No reception business hours Monday, Tuesday, and Thursday; call my desk instead.)


-----Original Message-----
From: Jonathan Ellis [mailto:jbellis@gmail.com] 
Sent: Friday, December 04, 2009 9:14 PM
To: cassandra-user@incubator.apache.org
Subject: Re: Persistently increasing read latency

On Fri, Dec 4, 2009 at 10:40 PM, Thorsten von Eicken <tv...@rightscale.com> wrote:
>>> For the first few hours of my load test, I have enough I/O.  The problem
>>> is that Cassandra is spending too much I/O on reads and writes and too
>>> little on compactions to function well in the long term.
>>>
>>
>> If you don't have enough room for both, it doesn't matter how you
>> prioritize.
>>
>
> Mhhh, maybe... You're technically correct. The question here is whether
> cassandra degrades gracefully or not. If I understand correctly, there are
> two ways to look at it:
>
> 1) it's accepting a higher request load than it can actually process and
> builds up an increasing backlog that eventually brings performance down far
> below the level of performance that it could sustain, thus it fails to do
> the type of early admission control or back-pressure that keeps the request
> load close to the sustainable maximum performance.
>
> 2) the compaction backlog size is a primary variable that has to be exposed
> and monitored in any cassandra installation because it's a direct indicator
> for an overload situation, just like hitting 100% cpu or similar would be.
>
> I can buy that (2) is ok, but (1) is certainly nicer.

I agree that it's much much nicer in the sense that it makes it more
obvious what the problem is (not enough capacity) but it only helps
diagnosis, not mitigation.