You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Mikio Braun <mi...@cs.tu-berlin.de> on 2010/08/12 11:29:42 UTC

Post on experiences with Cassandra for Twitter retweet analysis

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

I've put a blog post where I discuss our experiences with using
Cassandra as the main database backend for twimpact. Twimpact is
research project at the TU Berlin which aims at estimating user impact
based on retweet analysis. A live version of the analysis for the
japanese market can be seen at http://twimpact.jp

So far, we're very pleased with Cassandra performance, but we've also
had to overcome some issues on which I report in the blog and which are
hopefully interesting for other users of Cassandra.

The blog post can be found here:

http://blog.mikiobraun.de/2010/08/-cassandra-tips.html

- -M


- -- 
Dr. Mikio Braun                        email: mikio@cs.tu-berlin.de
TU Berlin                              web: ml.cs.tu-berlin.de/~mikio
Franklinstr. 28/29                     tel: +49 30 314 78627
10587 Berlin, Germany



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxjvwYACgkQtnXKX8rQtgB3AQCcCOuWhVePsWQt81uspETC4Zg3
s2MAn2wH/1xxOuTWGXpgmEyzI4Hmi99+
=08Y9
-----END PGP SIGNATURE-----

Re: Post on experiences with Cassandra for Twitter retweet analysis

Posted by Mikio Braun <mi...@cs.tu-berlin.de>.
Hi Eric,

no general problems per se with long rows. The only problem was that 
retrieving the whole row took about 10-20 seconds and the timeouts I had 
set (both the Cassandra RPC, as well as in the client library I used) 
were slower, so these requests could never complete. This gave me a bit 
of a headache, so I thought others should being aware of that.

I also changed the way we used these rows to do a slice request to 
extract only the potion relevant for the analysis. This cut down the 
requests to a few seconds which is fine for our needs.

-M

On 12.08.2010 18:55, Eric Evans wrote:
> On Thu, 2010-08-12 at 11:29 +0200, Mikio Braun wrote:
>> So far, we're very pleased with Cassandra performance, but we've also
>> had to overcome some issues on which I report in the blog and which
>> are hopefully interesting for other users of Cassandra.
>>
>> The blog post can be found here:
>>
>> http://blog.mikiobraun.de/2010/08/-cassandra-tips.html
>
> Thanks, this is a nice write up.
>
> I am curious though about the troubles you had using wide rows.  As a
> rule, several hundred thousand columns in a row should not be a problem.
> In fact, this runs contrary to the advice usually given since this
> should be the fastest/most efficient way to retrieve a dataset of that
> size.
>


Re: Post on experiences with Cassandra for Twitter retweet analysis

Posted by Eric Evans <ee...@rackspace.com>.
On Thu, 2010-08-12 at 11:29 +0200, Mikio Braun wrote:
> So far, we're very pleased with Cassandra performance, but we've also
> had to overcome some issues on which I report in the blog and which
> are hopefully interesting for other users of Cassandra.
> 
> The blog post can be found here:
> 
> http://blog.mikiobraun.de/2010/08/-cassandra-tips.html

Thanks, this is a nice write up.

I am curious though about the troubles you had using wide rows.  As a
rule, several hundred thousand columns in a row should not be a problem.
In fact, this runs contrary to the advice usually given since this
should be the fastest/most efficient way to retrieve a dataset of that
size.

-- 
Eric Evans
eevans@rackspace.com