You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Suhail Doshi <su...@mixpanel.com> on 2010/01/14 08:49:17 UTC

multiget_slice

I've been seeing multiget_slice take an extremely long time:

2010-01-14 07:44:00,513 INFO ------------------ Cassandra, delay:
3.64020800591 -----------------------
2010-01-14 07:44:00,513 INFO method: multiget_slice
2010-01-14 07:44:00,513 INFO {'keys':
[u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a93ec971e867b23664d990336ce481e0:7516fd43adaa5e0b8a65a672c39845d2',
u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:fe33779b0db3213f7e354c8e22ad9939:4df200d45716195e86c09a94a54a0c7a',
u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:71860c77c6745379b0d44304d66b6a13:e37f0136aa3ffaf149b351f6a4c948e9',
u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:1240f61999709d41292f759e500ad5be:69691c7bdcc3ce6d5d8a1361f22d04ac',
u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a6d5b5c3d715b79b59caf7aed18301ac:b53b3a3d6ab90ce0268229151c9bde11'],
'column_parent': ColumnParent(column_family='DistinctIndex',
super_column=None), 'predicate': SlicePredicate(column_names=None,
slice_range=SliceRange(count=14000, start='date_2009-07-01', reversed=False,
finish='date_2010-01-14'))}

2010-01-14 07:44:00,513 INFO result:

2010-01-14 07:44:00,513 INFO
{'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:fe33779b0db3213f7e354c8e22ad9939:4df200d45716195e86c09a94a54a0c7a':
[ColumnOrSuperColumn(column=Column(timestamp=1263231323,
name='date_2010-01-11', value='1'), super_column=None),
ColumnOrSuperColumn(column=Column(timestamp=1263333256,
name='date_2010-01-12', value='1'), super_column=None),
ColumnOrSuperColumn(column=Column(timestamp=1263418556,
name='date_2010-01-13', value='1'), super_column=None),
ColumnOrSuperColumn(column=Column(timestamp=1263451804,
name='date_2010-01-14', value='1'), super_column=None)],
'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:71860c77c6745379b0d44304d66b6a13:e37f0136aa3ffaf149b351f6a4c948e9':
[ColumnOrSuperColumn(column=Column(timestamp=1263231323,
name='date_2010-01-11', value='1'), super_column=None),
ColumnOrSuperColumn(column=Column(timestamp=1263333256,
name='date_2010-01-12', value='1'), super_column=None),
ColumnOrSuperColumn(column=Column(timestamp=1263418556,
name='date_2010-01-13', value='1'), super_column=None),
ColumnOrSuperColumn(column=Column(timestamp=1263451804,
name='date_2010-01-14', value='1'), super_column=None)],
'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a6d5b5c3d715b79b59caf7aed18301ac:b53b3a3d6ab90ce0268229151c9bde11':
[ColumnOrSuperColumn(column=Column(timestamp=1263333256,
name='date_2010-01-12', value='1'), super_column=None),
ColumnOrSuperColumn(column=Column(timestamp=1263418556,
name='date_2010-01-13', value='1'), super_column=None),
ColumnOrSuperColumn(column=Column(timestamp=1263451804,
name='date_2010-01-14', value='1'), super_column=None)],
'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a93ec971e867b23664d990336ce481e0:7516fd43adaa5e0b8a65a672c39845d2':
[ColumnOrSuperColumn(column=Column(timestamp=1263231323,
name='date_2010-01-11', value='1'), super_column=None),
ColumnOrSuperColumn(column=Column(timestamp=1263333256,
name='date_2010-01-12', value='1'), super_column=None),
ColumnOrSuperColumn(column=Column(timestamp=1263418556,
name='date_2010-01-13', value='1'), super_column=None),
ColumnOrSuperColumn(column=Column(timestamp=1263451804,
name='date_2010-01-14', value='1'), super_column=None)],
'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:1240f61999709d41292f759e500ad5be:69691c7bdcc3ce6d5d8a1361f22d04ac':
[ColumnOrSuperColumn(column=Column(timestamp=1263231323,
name='date_2010-01-11', value='1'), super_column=None),
ColumnOrSuperColumn(column=Column(timestamp=1263333256,
name='date_2010-01-12', value='1'), super_column=None),
ColumnOrSuperColumn(column=Column(timestamp=1263418556,
name='date_2010-01-13', value='1'), super_column=None),
ColumnOrSuperColumn(column=Column(timestamp=1263451804,
name='date_2010-01-14', value='1'), super_column=None)]}


The delay is the time at which it took to run the query and then return a
result. The box has 4GB of RAM and the *JVM_MAX_MEM (-Xmx) is set at 3G*. If
you're curious how I am running it:

/usr/bin/jsvc -home /usr/lib/jvm/java-6-openjdk/jre -pidfile
/var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
-cp
/usr/share/cassandra/antlr-3.1.3.jar:/usr/share/cassandra/apache-cassandra-incubating-0.5.0-rc1.jar:/usr/share/cassandra/apache-cassandra-incubating.jar:/usr/share/cassandra/clhm-production.jar:/usr/share/cassandra/commons-cli-1.1.jar:/usr/share/cassandra/commons-collections-3.2.1.jar:/usr/share/cassandra/commons-lang-2.4.jar:/usr/share/cassandra/google-collect-1.0-rc1.jar:/usr/share/cassandra/high-scale-lib.jar:/usr/share/cassandra/jline-0.9.94.jar:/usr/share/cassandra/json_simple-1.1.jar:/usr/share/cassandra/junit-4.6.jar:/usr/share/cassandra/libthrift-r894924.jar:/usr/share/cassandra/log4j-1.2.15.jar:/usr/share/cassandra/slf4j-api-1.5.8.jar:/usr/share/cassandra/slf4j-log4j12-1.5.8.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
-Xmx3G -Xms128M -Dcassandra -Dstorage-config=/etc/cassandra
-Dcom.sun.management.jmxremote.port=8080
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
org.apache.cassandra.service.CassandraDaemon

I am running version* 0.5.0rc2*.

Does anyone know what the bottleneck might be and how reads using
multiget_slice can be sped up? When I look at the memory used, it's only
about 1700 MB used. The box is not excessively swapping, running
iostat--everything seemed pretty okay.

Suhail

Re: multiget_slice

Posted by Suhail Doshi <di...@gmail.com>.
But as you can see value per column is just a byte so the time at which
it de-serializes the column shouldn't be horrid. Hopefully that's the right
thinking

On Thu, Jan 14, 2010 at 10:06 AM, Suhail Doshi <di...@gmail.com>wrote:

> Right now it's ~5-10 keys, with 5 columns per key.
>
> Later it will be 64 keys (max) with 200 columns per key worst case.
>
> Suhail
>
>
> On Thu, Jan 14, 2010 at 9:45 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>
>> how many keys are you fetching?  how many columns for each key?
>>
>> On Thu, Jan 14, 2010 at 1:49 AM, Suhail Doshi <su...@mixpanel.com>
>> wrote:
>> > I've been seeing multiget_slice take an extremely long time:
>> >
>> > 2010-01-14 07:44:00,513 INFO ------------------ Cassandra, delay:
>> > 3.64020800591 -----------------------
>> > 2010-01-14 07:44:00,513 INFO method: multiget_slice
>> > 2010-01-14 07:44:00,513 INFO {'keys':
>> >
>> [u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a93ec971e867b23664d990336ce481e0:7516fd43adaa5e0b8a65a672c39845d2',
>> >
>> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:fe33779b0db3213f7e354c8e22ad9939:4df200d45716195e86c09a94a54a0c7a',
>> >
>> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:71860c77c6745379b0d44304d66b6a13:e37f0136aa3ffaf149b351f6a4c948e9',
>> >
>> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:1240f61999709d41292f759e500ad5be:69691c7bdcc3ce6d5d8a1361f22d04ac',
>> >
>> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a6d5b5c3d715b79b59caf7aed18301ac:b53b3a3d6ab90ce0268229151c9bde11'],
>> > 'column_parent': ColumnParent(column_family='DistinctIndex',
>> > super_column=None), 'predicate': SlicePredicate(column_names=None,
>> > slice_range=SliceRange(count=14000, start='date_2009-07-01',
>> reversed=False,
>> > finish='date_2010-01-14'))}
>> >
>> > 2010-01-14 07:44:00,513 INFO result:
>> >
>> > 2010-01-14 07:44:00,513 INFO
>> >
>> {'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:fe33779b0db3213f7e354c8e22ad9939:4df200d45716195e86c09a94a54a0c7a':
>> > [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
>> > name='date_2010-01-11', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263333256,
>> > name='date_2010-01-12', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
>> > name='date_2010-01-13', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
>> > name='date_2010-01-14', value='1'), super_column=None)],
>> >
>> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:71860c77c6745379b0d44304d66b6a13:e37f0136aa3ffaf149b351f6a4c948e9':
>> > [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
>> > name='date_2010-01-11', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263333256,
>> > name='date_2010-01-12', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
>> > name='date_2010-01-13', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
>> > name='date_2010-01-14', value='1'), super_column=None)],
>> >
>> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a6d5b5c3d715b79b59caf7aed18301ac:b53b3a3d6ab90ce0268229151c9bde11':
>> > [ColumnOrSuperColumn(column=Column(timestamp=1263333256,
>> > name='date_2010-01-12', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
>> > name='date_2010-01-13', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
>> > name='date_2010-01-14', value='1'), super_column=None)],
>> >
>> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a93ec971e867b23664d990336ce481e0:7516fd43adaa5e0b8a65a672c39845d2':
>> > [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
>> > name='date_2010-01-11', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263333256,
>> > name='date_2010-01-12', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
>> > name='date_2010-01-13', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
>> > name='date_2010-01-14', value='1'), super_column=None)],
>> >
>> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:1240f61999709d41292f759e500ad5be:69691c7bdcc3ce6d5d8a1361f22d04ac':
>> > [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
>> > name='date_2010-01-11', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263333256,
>> > name='date_2010-01-12', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
>> > name='date_2010-01-13', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
>> > name='date_2010-01-14', value='1'), super_column=None)]}
>> >
>> >
>> > The delay is the time at which it took to run the query and then return
>> a
>> > result. The box has 4GB of RAM and the *JVM_MAX_MEM (-Xmx) is set at
>> 3G*. If
>> > you're curious how I am running it:
>> >
>> > /usr/bin/jsvc -home /usr/lib/jvm/java-6-openjdk/jre -pidfile
>> > /var/run/cassandra.pid -errfile &1 -outfile
>> /var/log/cassandra/output.log
>> > -cp
>> >
>> /usr/share/cassandra/antlr-3.1.3.jar:/usr/share/cassandra/apache-cassandra-incubating-0.5.0-rc1.jar:/usr/share/cassandra/apache-cassandra-incubating.jar:/usr/share/cassandra/clhm-production.jar:/usr/share/cassandra/commons-cli-1.1.jar:/usr/share/cassandra/commons-collections-3.2.1.jar:/usr/share/cassandra/commons-lang-2.4.jar:/usr/share/cassandra/google-collect-1.0-rc1.jar:/usr/share/cassandra/high-scale-lib.jar:/usr/share/cassandra/jline-0.9.94.jar:/usr/share/cassandra/json_simple-1.1.jar:/usr/share/cassandra/junit-4.6.jar:/usr/share/cassandra/libthrift-r894924.jar:/usr/share/cassandra/log4j-1.2.15.jar:/usr/share/cassandra/slf4j-api-1.5.8.jar:/usr/share/cassandra/slf4j-log4j12-1.5.8.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
>> > -Xmx3G -Xms128M -Dcassandra -Dstorage-config=/etc/cassandra
>> > -Dcom.sun.management.jmxremote.port=8080
>> > -Dcom.sun.management.jmxremote.ssl=false
>> > -Dcom.sun.management.jmxremote.authenticate=false
>> > org.apache.cassandra.service.CassandraDaemon
>> >
>> > I am running version* 0.5.0rc2*.
>> >
>> > Does anyone know what the bottleneck might be and how reads using
>> > multiget_slice can be sped up? When I look at the memory used, it's only
>> > about 1700 MB used. The box is not excessively swapping, running
>> > iostat--everything seemed pretty okay.
>> >
>> > Suhail
>> >
>>
>
>
>
> --
> http://mixpanel.com
> Blog: http://blog.mixpanel.com
>



-- 
http://mixpanel.com
Blog: http://blog.mixpanel.com

Re: multiget_slice

Posted by Suhail Doshi <di...@gmail.com>.
Yeah, I think you're right:

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda1             34.00    70.00  409.60   11.20 22596.80   649.60    55.24
 32.61   77.67   2.38 100.00
sda2              0.00     0.00    0.00    0.00     0.00     0.00     0.00
  0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.25    0.00    0.55   39.72    0.90   56.58

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda1             39.60    79.20  357.40   10.00 20022.40   713.60    56.44
 30.04   81.59   2.70  99.12
sda2              0.00     0.00    0.00    0.00     0.00     0.00     0.00
  0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.70    0.00    0.40   38.80    0.90   58.20

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda1             43.40    72.00  421.00    9.00 23248.00   648.00    55.57
 32.64   75.74   2.33 100.00
sda2              0.00     0.00    0.00    0.00     0.00     0.00     0.00
  0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.00    0.00    0.40   37.41    1.00   59.19

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda1             19.60    58.40  348.20    6.60 19691.20   520.00    56.97
 30.88   86.55   2.78  98.80
sda2              1.40     0.00    0.20    0.00    12.80     0.00    64.00
  0.00   20.00  20.00   0.40

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.65    0.00    0.65   38.32    0.90   57.48

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda1             34.00    61.80  424.40   10.60 24348.80   579.20    57.31
 33.25   76.89   2.30 100.00
sda2              0.00     0.00    0.00    0.00     0.00     0.00     0.00
  0.00    0.00   0.00   0.00

On Thu, Jan 14, 2010 at 10:22 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> it sounds like you just don't have enough ram for the OS to cache your
> "hot" data set so you are getting killed on disk seeks.  iostat -x 5
> (for example) during load should verify this.
>
> On Thu, Jan 14, 2010 at 11:19 AM, Suhail Doshi <di...@gmail.com>
> wrote:
> > Looking at my data directory: 14 G. Just Index.db based files: 4.5 G.
> >
> > Yes only one node so far.
> >
> > vmstat -n 1 -S m
> > procs -----------memory---------- ---swap-- -----io---- -system--
> > ----cpu----
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
> id
> > wa
> >  0  0     22    585     32   2557    0    0    71    31    5    5  1  0
> 94
> >  3
> >
> > On Thu, Jan 14, 2010 at 10:11 AM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >
> >> how much data do you have on disk?  (only on enode?)  how large are
> >> the columns you are reading?  how much ram does vmstat say is being
> >> used for cache?
> >>
> >> On Thu, Jan 14, 2010 at 11:06 AM, Suhail Doshi <
> digitalwarfare@gmail.com>
> >> wrote:
> >> > Right now it's ~5-10 keys, with 5 columns per key.
> >> >
> >> > Later it will be 64 keys (max) with 200 columns per key worst case.
> >> >
> >> > Suhail
> >> >
> >> > On Thu, Jan 14, 2010 at 9:45 AM, Jonathan Ellis <jb...@gmail.com>
> >> wrote:
> >> >
> >> >> how many keys are you fetching?  how many columns for each key?
> >> >>
> >> >> On Thu, Jan 14, 2010 at 1:49 AM, Suhail Doshi <su...@mixpanel.com>
> >> wrote:
> >> >> > I've been seeing multiget_slice take an extremely long time:
> >> >> >
> >> >> > 2010-01-14 07:44:00,513 INFO ------------------ Cassandra, delay:
> >> >> > 3.64020800591 -----------------------
> >> >> > 2010-01-14 07:44:00,513 INFO method: multiget_slice
> >> >> > 2010-01-14 07:44:00,513 INFO {'keys':
> >> >> >
> >> >>
> >>
> [u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a93ec971e867b23664d990336ce481e0:7516fd43adaa5e0b8a65a672c39845d2',
> >> >> >
> >> >>
> >>
> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:fe33779b0db3213f7e354c8e22ad9939:4df200d45716195e86c09a94a54a0c7a',
> >> >> >
> >> >>
> >>
> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:71860c77c6745379b0d44304d66b6a13:e37f0136aa3ffaf149b351f6a4c948e9',
> >> >> >
> >> >>
> >>
> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:1240f61999709d41292f759e500ad5be:69691c7bdcc3ce6d5d8a1361f22d04ac',
> >> >> >
> >> >>
> >>
> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a6d5b5c3d715b79b59caf7aed18301ac:b53b3a3d6ab90ce0268229151c9bde11'],
> >> >> > 'column_parent': ColumnParent(column_family='DistinctIndex',
> >> >> > super_column=None), 'predicate': SlicePredicate(column_names=None,
> >> >> > slice_range=SliceRange(count=14000, start='date_2009-07-01',
> >> >> reversed=False,
> >> >> > finish='date_2010-01-14'))}
> >> >> >
> >> >> > 2010-01-14 07:44:00,513 INFO result:
> >> >> >
> >> >> > 2010-01-14 07:44:00,513 INFO
> >> >> >
> >> >>
> >>
> {'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:fe33779b0db3213f7e354c8e22ad9939:4df200d45716195e86c09a94a54a0c7a':
> >> >> > [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
> >> >> > name='date_2010-01-11', value='1'), super_column=None),
> >> >> > ColumnOrSuperColumn(column=Column(timestamp=1263333256,
> >> >> > name='date_2010-01-12', value='1'), super_column=None),
> >> >> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
> >> >> > name='date_2010-01-13', value='1'), super_column=None),
> >> >> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
> >> >> > name='date_2010-01-14', value='1'), super_column=None)],
> >> >> >
> >> >>
> >>
> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:71860c77c6745379b0d44304d66b6a13:e37f0136aa3ffaf149b351f6a4c948e9':
> >> >> > [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
> >> >> > name='date_2010-01-11', value='1'), super_column=None),
> >> >> > ColumnOrSuperColumn(column=Column(timestamp=1263333256,
> >> >> > name='date_2010-01-12', value='1'), super_column=None),
> >> >> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
> >> >> > name='date_2010-01-13', value='1'), super_column=None),
> >> >> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
> >> >> > name='date_2010-01-14', value='1'), super_column=None)],
> >> >> >
> >> >>
> >>
> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a6d5b5c3d715b79b59caf7aed18301ac:b53b3a3d6ab90ce0268229151c9bde11':
> >> >> > [ColumnOrSuperColumn(column=Column(timestamp=1263333256,
> >> >> > name='date_2010-01-12', value='1'), super_column=None),
> >> >> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
> >> >> > name='date_2010-01-13', value='1'), super_column=None),
> >> >> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
> >> >> > name='date_2010-01-14', value='1'), super_column=None)],
> >> >> >
> >> >>
> >>
> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a93ec971e867b23664d990336ce481e0:7516fd43adaa5e0b8a65a672c39845d2':
> >> >> > [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
> >> >> > name='date_2010-01-11', value='1'), super_column=None),
> >> >> > ColumnOrSuperColumn(column=Column(timestamp=1263333256,
> >> >> > name='date_2010-01-12', value='1'), super_column=None),
> >> >> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
> >> >> > name='date_2010-01-13', value='1'), super_column=None),
> >> >> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
> >> >> > name='date_2010-01-14', value='1'), super_column=None)],
> >> >> >
> >> >>
> >>
> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:1240f61999709d41292f759e500ad5be:69691c7bdcc3ce6d5d8a1361f22d04ac':
> >> >> > [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
> >> >> > name='date_2010-01-11', value='1'), super_column=None),
> >> >> > ColumnOrSuperColumn(column=Column(timestamp=1263333256,
> >> >> > name='date_2010-01-12', value='1'), super_column=None),
> >> >> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
> >> >> > name='date_2010-01-13', value='1'), super_column=None),
> >> >> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
> >> >> > name='date_2010-01-14', value='1'), super_column=None)]}
> >> >> >
> >> >> >
> >> >> > The delay is the time at which it took to run the query and then
> >> return a
> >> >> > result. The box has 4GB of RAM and the *JVM_MAX_MEM (-Xmx) is set
> at
> >> 3G*.
> >> >> If
> >> >> > you're curious how I am running it:
> >> >> >
> >> >> > /usr/bin/jsvc -home /usr/lib/jvm/java-6-openjdk/jre -pidfile
> >> >> > /var/run/cassandra.pid -errfile &1 -outfile
> >> /var/log/cassandra/output.log
> >> >> > -cp
> >> >> >
> >> >>
> >>
> /usr/share/cassandra/antlr-3.1.3.jar:/usr/share/cassandra/apache-cassandra-incubating-0.5.0-rc1.jar:/usr/share/cassandra/apache-cassandra-incubating.jar:/usr/share/cassandra/clhm-production.jar:/usr/share/cassandra/commons-cli-1.1.jar:/usr/share/cassandra/commons-collections-3.2.1.jar:/usr/share/cassandra/commons-lang-2.4.jar:/usr/share/cassandra/google-collect-1.0-rc1.jar:/usr/share/cassandra/high-scale-lib.jar:/usr/share/cassandra/jline-0.9.94.jar:/usr/share/cassandra/json_simple-1.1.jar:/usr/share/cassandra/junit-4.6.jar:/usr/share/cassandra/libthrift-r894924.jar:/usr/share/cassandra/log4j-1.2.15.jar:/usr/share/cassandra/slf4j-api-1.5.8.jar:/usr/share/cassandra/slf4j-log4j12-1.5.8.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
> >> >> > -Xmx3G -Xms128M -Dcassandra -Dstorage-config=/etc/cassandra
> >> >> > -Dcom.sun.management.jmxremote.port=8080
> >> >> > -Dcom.sun.management.jmxremote.ssl=false
> >> >> > -Dcom.sun.management.jmxremote.authenticate=false
> >> >> > org.apache.cassandra.service.CassandraDaemon
> >> >> >
> >> >> > I am running version* 0.5.0rc2*.
> >> >> >
> >> >> > Does anyone know what the bottleneck might be and how reads using
> >> >> > multiget_slice can be sped up? When I look at the memory used, it's
> >> only
> >> >> > about 1700 MB used. The box is not excessively swapping, running
> >> >> > iostat--everything seemed pretty okay.
> >> >> >
> >> >> > Suhail
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > http://mixpanel.com
> >> > Blog: http://blog.mixpanel.com
> >> >
> >>
> >
> >
> >
> > --
> > http://mixpanel.com
> > Blog: http://blog.mixpanel.com
> >
>



-- 
http://mixpanel.com
Blog: http://blog.mixpanel.com

Re: multiget_slice

Posted by Jonathan Ellis <jb...@gmail.com>.
it sounds like you just don't have enough ram for the OS to cache your
"hot" data set so you are getting killed on disk seeks.  iostat -x 5
(for example) during load should verify this.

On Thu, Jan 14, 2010 at 11:19 AM, Suhail Doshi <di...@gmail.com> wrote:
> Looking at my data directory: 14 G. Just Index.db based files: 4.5 G.
>
> Yes only one node so far.
>
> vmstat -n 1 -S m
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
> wa
>  0  0     22    585     32   2557    0    0    71    31    5    5  1  0 94
>  3
>
> On Thu, Jan 14, 2010 at 10:11 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>
>> how much data do you have on disk?  (only on enode?)  how large are
>> the columns you are reading?  how much ram does vmstat say is being
>> used for cache?
>>
>> On Thu, Jan 14, 2010 at 11:06 AM, Suhail Doshi <di...@gmail.com>
>> wrote:
>> > Right now it's ~5-10 keys, with 5 columns per key.
>> >
>> > Later it will be 64 keys (max) with 200 columns per key worst case.
>> >
>> > Suhail
>> >
>> > On Thu, Jan 14, 2010 at 9:45 AM, Jonathan Ellis <jb...@gmail.com>
>> wrote:
>> >
>> >> how many keys are you fetching?  how many columns for each key?
>> >>
>> >> On Thu, Jan 14, 2010 at 1:49 AM, Suhail Doshi <su...@mixpanel.com>
>> wrote:
>> >> > I've been seeing multiget_slice take an extremely long time:
>> >> >
>> >> > 2010-01-14 07:44:00,513 INFO ------------------ Cassandra, delay:
>> >> > 3.64020800591 -----------------------
>> >> > 2010-01-14 07:44:00,513 INFO method: multiget_slice
>> >> > 2010-01-14 07:44:00,513 INFO {'keys':
>> >> >
>> >>
>> [u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a93ec971e867b23664d990336ce481e0:7516fd43adaa5e0b8a65a672c39845d2',
>> >> >
>> >>
>> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:fe33779b0db3213f7e354c8e22ad9939:4df200d45716195e86c09a94a54a0c7a',
>> >> >
>> >>
>> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:71860c77c6745379b0d44304d66b6a13:e37f0136aa3ffaf149b351f6a4c948e9',
>> >> >
>> >>
>> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:1240f61999709d41292f759e500ad5be:69691c7bdcc3ce6d5d8a1361f22d04ac',
>> >> >
>> >>
>> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a6d5b5c3d715b79b59caf7aed18301ac:b53b3a3d6ab90ce0268229151c9bde11'],
>> >> > 'column_parent': ColumnParent(column_family='DistinctIndex',
>> >> > super_column=None), 'predicate': SlicePredicate(column_names=None,
>> >> > slice_range=SliceRange(count=14000, start='date_2009-07-01',
>> >> reversed=False,
>> >> > finish='date_2010-01-14'))}
>> >> >
>> >> > 2010-01-14 07:44:00,513 INFO result:
>> >> >
>> >> > 2010-01-14 07:44:00,513 INFO
>> >> >
>> >>
>> {'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:fe33779b0db3213f7e354c8e22ad9939:4df200d45716195e86c09a94a54a0c7a':
>> >> > [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
>> >> > name='date_2010-01-11', value='1'), super_column=None),
>> >> > ColumnOrSuperColumn(column=Column(timestamp=1263333256,
>> >> > name='date_2010-01-12', value='1'), super_column=None),
>> >> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
>> >> > name='date_2010-01-13', value='1'), super_column=None),
>> >> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
>> >> > name='date_2010-01-14', value='1'), super_column=None)],
>> >> >
>> >>
>> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:71860c77c6745379b0d44304d66b6a13:e37f0136aa3ffaf149b351f6a4c948e9':
>> >> > [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
>> >> > name='date_2010-01-11', value='1'), super_column=None),
>> >> > ColumnOrSuperColumn(column=Column(timestamp=1263333256,
>> >> > name='date_2010-01-12', value='1'), super_column=None),
>> >> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
>> >> > name='date_2010-01-13', value='1'), super_column=None),
>> >> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
>> >> > name='date_2010-01-14', value='1'), super_column=None)],
>> >> >
>> >>
>> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a6d5b5c3d715b79b59caf7aed18301ac:b53b3a3d6ab90ce0268229151c9bde11':
>> >> > [ColumnOrSuperColumn(column=Column(timestamp=1263333256,
>> >> > name='date_2010-01-12', value='1'), super_column=None),
>> >> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
>> >> > name='date_2010-01-13', value='1'), super_column=None),
>> >> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
>> >> > name='date_2010-01-14', value='1'), super_column=None)],
>> >> >
>> >>
>> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a93ec971e867b23664d990336ce481e0:7516fd43adaa5e0b8a65a672c39845d2':
>> >> > [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
>> >> > name='date_2010-01-11', value='1'), super_column=None),
>> >> > ColumnOrSuperColumn(column=Column(timestamp=1263333256,
>> >> > name='date_2010-01-12', value='1'), super_column=None),
>> >> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
>> >> > name='date_2010-01-13', value='1'), super_column=None),
>> >> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
>> >> > name='date_2010-01-14', value='1'), super_column=None)],
>> >> >
>> >>
>> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:1240f61999709d41292f759e500ad5be:69691c7bdcc3ce6d5d8a1361f22d04ac':
>> >> > [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
>> >> > name='date_2010-01-11', value='1'), super_column=None),
>> >> > ColumnOrSuperColumn(column=Column(timestamp=1263333256,
>> >> > name='date_2010-01-12', value='1'), super_column=None),
>> >> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
>> >> > name='date_2010-01-13', value='1'), super_column=None),
>> >> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
>> >> > name='date_2010-01-14', value='1'), super_column=None)]}
>> >> >
>> >> >
>> >> > The delay is the time at which it took to run the query and then
>> return a
>> >> > result. The box has 4GB of RAM and the *JVM_MAX_MEM (-Xmx) is set at
>> 3G*.
>> >> If
>> >> > you're curious how I am running it:
>> >> >
>> >> > /usr/bin/jsvc -home /usr/lib/jvm/java-6-openjdk/jre -pidfile
>> >> > /var/run/cassandra.pid -errfile &1 -outfile
>> /var/log/cassandra/output.log
>> >> > -cp
>> >> >
>> >>
>> /usr/share/cassandra/antlr-3.1.3.jar:/usr/share/cassandra/apache-cassandra-incubating-0.5.0-rc1.jar:/usr/share/cassandra/apache-cassandra-incubating.jar:/usr/share/cassandra/clhm-production.jar:/usr/share/cassandra/commons-cli-1.1.jar:/usr/share/cassandra/commons-collections-3.2.1.jar:/usr/share/cassandra/commons-lang-2.4.jar:/usr/share/cassandra/google-collect-1.0-rc1.jar:/usr/share/cassandra/high-scale-lib.jar:/usr/share/cassandra/jline-0.9.94.jar:/usr/share/cassandra/json_simple-1.1.jar:/usr/share/cassandra/junit-4.6.jar:/usr/share/cassandra/libthrift-r894924.jar:/usr/share/cassandra/log4j-1.2.15.jar:/usr/share/cassandra/slf4j-api-1.5.8.jar:/usr/share/cassandra/slf4j-log4j12-1.5.8.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
>> >> > -Xmx3G -Xms128M -Dcassandra -Dstorage-config=/etc/cassandra
>> >> > -Dcom.sun.management.jmxremote.port=8080
>> >> > -Dcom.sun.management.jmxremote.ssl=false
>> >> > -Dcom.sun.management.jmxremote.authenticate=false
>> >> > org.apache.cassandra.service.CassandraDaemon
>> >> >
>> >> > I am running version* 0.5.0rc2*.
>> >> >
>> >> > Does anyone know what the bottleneck might be and how reads using
>> >> > multiget_slice can be sped up? When I look at the memory used, it's
>> only
>> >> > about 1700 MB used. The box is not excessively swapping, running
>> >> > iostat--everything seemed pretty okay.
>> >> >
>> >> > Suhail
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > http://mixpanel.com
>> > Blog: http://blog.mixpanel.com
>> >
>>
>
>
>
> --
> http://mixpanel.com
> Blog: http://blog.mixpanel.com
>

Re: multiget_slice

Posted by Suhail Doshi <di...@gmail.com>.
Looking at my data directory: 14 G. Just Index.db based files: 4.5 G.

Yes only one node so far.

vmstat -n 1 -S m
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
wa
 0  0     22    585     32   2557    0    0    71    31    5    5  1  0 94
 3

On Thu, Jan 14, 2010 at 10:11 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> how much data do you have on disk?  (only on enode?)  how large are
> the columns you are reading?  how much ram does vmstat say is being
> used for cache?
>
> On Thu, Jan 14, 2010 at 11:06 AM, Suhail Doshi <di...@gmail.com>
> wrote:
> > Right now it's ~5-10 keys, with 5 columns per key.
> >
> > Later it will be 64 keys (max) with 200 columns per key worst case.
> >
> > Suhail
> >
> > On Thu, Jan 14, 2010 at 9:45 AM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >
> >> how many keys are you fetching?  how many columns for each key?
> >>
> >> On Thu, Jan 14, 2010 at 1:49 AM, Suhail Doshi <su...@mixpanel.com>
> wrote:
> >> > I've been seeing multiget_slice take an extremely long time:
> >> >
> >> > 2010-01-14 07:44:00,513 INFO ------------------ Cassandra, delay:
> >> > 3.64020800591 -----------------------
> >> > 2010-01-14 07:44:00,513 INFO method: multiget_slice
> >> > 2010-01-14 07:44:00,513 INFO {'keys':
> >> >
> >>
> [u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a93ec971e867b23664d990336ce481e0:7516fd43adaa5e0b8a65a672c39845d2',
> >> >
> >>
> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:fe33779b0db3213f7e354c8e22ad9939:4df200d45716195e86c09a94a54a0c7a',
> >> >
> >>
> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:71860c77c6745379b0d44304d66b6a13:e37f0136aa3ffaf149b351f6a4c948e9',
> >> >
> >>
> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:1240f61999709d41292f759e500ad5be:69691c7bdcc3ce6d5d8a1361f22d04ac',
> >> >
> >>
> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a6d5b5c3d715b79b59caf7aed18301ac:b53b3a3d6ab90ce0268229151c9bde11'],
> >> > 'column_parent': ColumnParent(column_family='DistinctIndex',
> >> > super_column=None), 'predicate': SlicePredicate(column_names=None,
> >> > slice_range=SliceRange(count=14000, start='date_2009-07-01',
> >> reversed=False,
> >> > finish='date_2010-01-14'))}
> >> >
> >> > 2010-01-14 07:44:00,513 INFO result:
> >> >
> >> > 2010-01-14 07:44:00,513 INFO
> >> >
> >>
> {'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:fe33779b0db3213f7e354c8e22ad9939:4df200d45716195e86c09a94a54a0c7a':
> >> > [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
> >> > name='date_2010-01-11', value='1'), super_column=None),
> >> > ColumnOrSuperColumn(column=Column(timestamp=1263333256,
> >> > name='date_2010-01-12', value='1'), super_column=None),
> >> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
> >> > name='date_2010-01-13', value='1'), super_column=None),
> >> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
> >> > name='date_2010-01-14', value='1'), super_column=None)],
> >> >
> >>
> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:71860c77c6745379b0d44304d66b6a13:e37f0136aa3ffaf149b351f6a4c948e9':
> >> > [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
> >> > name='date_2010-01-11', value='1'), super_column=None),
> >> > ColumnOrSuperColumn(column=Column(timestamp=1263333256,
> >> > name='date_2010-01-12', value='1'), super_column=None),
> >> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
> >> > name='date_2010-01-13', value='1'), super_column=None),
> >> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
> >> > name='date_2010-01-14', value='1'), super_column=None)],
> >> >
> >>
> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a6d5b5c3d715b79b59caf7aed18301ac:b53b3a3d6ab90ce0268229151c9bde11':
> >> > [ColumnOrSuperColumn(column=Column(timestamp=1263333256,
> >> > name='date_2010-01-12', value='1'), super_column=None),
> >> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
> >> > name='date_2010-01-13', value='1'), super_column=None),
> >> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
> >> > name='date_2010-01-14', value='1'), super_column=None)],
> >> >
> >>
> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a93ec971e867b23664d990336ce481e0:7516fd43adaa5e0b8a65a672c39845d2':
> >> > [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
> >> > name='date_2010-01-11', value='1'), super_column=None),
> >> > ColumnOrSuperColumn(column=Column(timestamp=1263333256,
> >> > name='date_2010-01-12', value='1'), super_column=None),
> >> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
> >> > name='date_2010-01-13', value='1'), super_column=None),
> >> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
> >> > name='date_2010-01-14', value='1'), super_column=None)],
> >> >
> >>
> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:1240f61999709d41292f759e500ad5be:69691c7bdcc3ce6d5d8a1361f22d04ac':
> >> > [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
> >> > name='date_2010-01-11', value='1'), super_column=None),
> >> > ColumnOrSuperColumn(column=Column(timestamp=1263333256,
> >> > name='date_2010-01-12', value='1'), super_column=None),
> >> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
> >> > name='date_2010-01-13', value='1'), super_column=None),
> >> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
> >> > name='date_2010-01-14', value='1'), super_column=None)]}
> >> >
> >> >
> >> > The delay is the time at which it took to run the query and then
> return a
> >> > result. The box has 4GB of RAM and the *JVM_MAX_MEM (-Xmx) is set at
> 3G*.
> >> If
> >> > you're curious how I am running it:
> >> >
> >> > /usr/bin/jsvc -home /usr/lib/jvm/java-6-openjdk/jre -pidfile
> >> > /var/run/cassandra.pid -errfile &1 -outfile
> /var/log/cassandra/output.log
> >> > -cp
> >> >
> >>
> /usr/share/cassandra/antlr-3.1.3.jar:/usr/share/cassandra/apache-cassandra-incubating-0.5.0-rc1.jar:/usr/share/cassandra/apache-cassandra-incubating.jar:/usr/share/cassandra/clhm-production.jar:/usr/share/cassandra/commons-cli-1.1.jar:/usr/share/cassandra/commons-collections-3.2.1.jar:/usr/share/cassandra/commons-lang-2.4.jar:/usr/share/cassandra/google-collect-1.0-rc1.jar:/usr/share/cassandra/high-scale-lib.jar:/usr/share/cassandra/jline-0.9.94.jar:/usr/share/cassandra/json_simple-1.1.jar:/usr/share/cassandra/junit-4.6.jar:/usr/share/cassandra/libthrift-r894924.jar:/usr/share/cassandra/log4j-1.2.15.jar:/usr/share/cassandra/slf4j-api-1.5.8.jar:/usr/share/cassandra/slf4j-log4j12-1.5.8.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
> >> > -Xmx3G -Xms128M -Dcassandra -Dstorage-config=/etc/cassandra
> >> > -Dcom.sun.management.jmxremote.port=8080
> >> > -Dcom.sun.management.jmxremote.ssl=false
> >> > -Dcom.sun.management.jmxremote.authenticate=false
> >> > org.apache.cassandra.service.CassandraDaemon
> >> >
> >> > I am running version* 0.5.0rc2*.
> >> >
> >> > Does anyone know what the bottleneck might be and how reads using
> >> > multiget_slice can be sped up? When I look at the memory used, it's
> only
> >> > about 1700 MB used. The box is not excessively swapping, running
> >> > iostat--everything seemed pretty okay.
> >> >
> >> > Suhail
> >> >
> >>
> >
> >
> >
> > --
> > http://mixpanel.com
> > Blog: http://blog.mixpanel.com
> >
>



-- 
http://mixpanel.com
Blog: http://blog.mixpanel.com

Re: multiget_slice

Posted by Jonathan Ellis <jb...@gmail.com>.
how much data do you have on disk?  (only on enode?)  how large are
the columns you are reading?  how much ram does vmstat say is being
used for cache?

On Thu, Jan 14, 2010 at 11:06 AM, Suhail Doshi <di...@gmail.com> wrote:
> Right now it's ~5-10 keys, with 5 columns per key.
>
> Later it will be 64 keys (max) with 200 columns per key worst case.
>
> Suhail
>
> On Thu, Jan 14, 2010 at 9:45 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>
>> how many keys are you fetching?  how many columns for each key?
>>
>> On Thu, Jan 14, 2010 at 1:49 AM, Suhail Doshi <su...@mixpanel.com> wrote:
>> > I've been seeing multiget_slice take an extremely long time:
>> >
>> > 2010-01-14 07:44:00,513 INFO ------------------ Cassandra, delay:
>> > 3.64020800591 -----------------------
>> > 2010-01-14 07:44:00,513 INFO method: multiget_slice
>> > 2010-01-14 07:44:00,513 INFO {'keys':
>> >
>> [u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a93ec971e867b23664d990336ce481e0:7516fd43adaa5e0b8a65a672c39845d2',
>> >
>> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:fe33779b0db3213f7e354c8e22ad9939:4df200d45716195e86c09a94a54a0c7a',
>> >
>> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:71860c77c6745379b0d44304d66b6a13:e37f0136aa3ffaf149b351f6a4c948e9',
>> >
>> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:1240f61999709d41292f759e500ad5be:69691c7bdcc3ce6d5d8a1361f22d04ac',
>> >
>> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a6d5b5c3d715b79b59caf7aed18301ac:b53b3a3d6ab90ce0268229151c9bde11'],
>> > 'column_parent': ColumnParent(column_family='DistinctIndex',
>> > super_column=None), 'predicate': SlicePredicate(column_names=None,
>> > slice_range=SliceRange(count=14000, start='date_2009-07-01',
>> reversed=False,
>> > finish='date_2010-01-14'))}
>> >
>> > 2010-01-14 07:44:00,513 INFO result:
>> >
>> > 2010-01-14 07:44:00,513 INFO
>> >
>> {'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:fe33779b0db3213f7e354c8e22ad9939:4df200d45716195e86c09a94a54a0c7a':
>> > [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
>> > name='date_2010-01-11', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263333256,
>> > name='date_2010-01-12', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
>> > name='date_2010-01-13', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
>> > name='date_2010-01-14', value='1'), super_column=None)],
>> >
>> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:71860c77c6745379b0d44304d66b6a13:e37f0136aa3ffaf149b351f6a4c948e9':
>> > [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
>> > name='date_2010-01-11', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263333256,
>> > name='date_2010-01-12', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
>> > name='date_2010-01-13', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
>> > name='date_2010-01-14', value='1'), super_column=None)],
>> >
>> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a6d5b5c3d715b79b59caf7aed18301ac:b53b3a3d6ab90ce0268229151c9bde11':
>> > [ColumnOrSuperColumn(column=Column(timestamp=1263333256,
>> > name='date_2010-01-12', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
>> > name='date_2010-01-13', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
>> > name='date_2010-01-14', value='1'), super_column=None)],
>> >
>> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a93ec971e867b23664d990336ce481e0:7516fd43adaa5e0b8a65a672c39845d2':
>> > [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
>> > name='date_2010-01-11', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263333256,
>> > name='date_2010-01-12', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
>> > name='date_2010-01-13', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
>> > name='date_2010-01-14', value='1'), super_column=None)],
>> >
>> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:1240f61999709d41292f759e500ad5be:69691c7bdcc3ce6d5d8a1361f22d04ac':
>> > [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
>> > name='date_2010-01-11', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263333256,
>> > name='date_2010-01-12', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
>> > name='date_2010-01-13', value='1'), super_column=None),
>> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
>> > name='date_2010-01-14', value='1'), super_column=None)]}
>> >
>> >
>> > The delay is the time at which it took to run the query and then return a
>> > result. The box has 4GB of RAM and the *JVM_MAX_MEM (-Xmx) is set at 3G*.
>> If
>> > you're curious how I am running it:
>> >
>> > /usr/bin/jsvc -home /usr/lib/jvm/java-6-openjdk/jre -pidfile
>> > /var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
>> > -cp
>> >
>> /usr/share/cassandra/antlr-3.1.3.jar:/usr/share/cassandra/apache-cassandra-incubating-0.5.0-rc1.jar:/usr/share/cassandra/apache-cassandra-incubating.jar:/usr/share/cassandra/clhm-production.jar:/usr/share/cassandra/commons-cli-1.1.jar:/usr/share/cassandra/commons-collections-3.2.1.jar:/usr/share/cassandra/commons-lang-2.4.jar:/usr/share/cassandra/google-collect-1.0-rc1.jar:/usr/share/cassandra/high-scale-lib.jar:/usr/share/cassandra/jline-0.9.94.jar:/usr/share/cassandra/json_simple-1.1.jar:/usr/share/cassandra/junit-4.6.jar:/usr/share/cassandra/libthrift-r894924.jar:/usr/share/cassandra/log4j-1.2.15.jar:/usr/share/cassandra/slf4j-api-1.5.8.jar:/usr/share/cassandra/slf4j-log4j12-1.5.8.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
>> > -Xmx3G -Xms128M -Dcassandra -Dstorage-config=/etc/cassandra
>> > -Dcom.sun.management.jmxremote.port=8080
>> > -Dcom.sun.management.jmxremote.ssl=false
>> > -Dcom.sun.management.jmxremote.authenticate=false
>> > org.apache.cassandra.service.CassandraDaemon
>> >
>> > I am running version* 0.5.0rc2*.
>> >
>> > Does anyone know what the bottleneck might be and how reads using
>> > multiget_slice can be sped up? When I look at the memory used, it's only
>> > about 1700 MB used. The box is not excessively swapping, running
>> > iostat--everything seemed pretty okay.
>> >
>> > Suhail
>> >
>>
>
>
>
> --
> http://mixpanel.com
> Blog: http://blog.mixpanel.com
>

Re: multiget_slice

Posted by Suhail Doshi <di...@gmail.com>.
Right now it's ~5-10 keys, with 5 columns per key.

Later it will be 64 keys (max) with 200 columns per key worst case.

Suhail

On Thu, Jan 14, 2010 at 9:45 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> how many keys are you fetching?  how many columns for each key?
>
> On Thu, Jan 14, 2010 at 1:49 AM, Suhail Doshi <su...@mixpanel.com> wrote:
> > I've been seeing multiget_slice take an extremely long time:
> >
> > 2010-01-14 07:44:00,513 INFO ------------------ Cassandra, delay:
> > 3.64020800591 -----------------------
> > 2010-01-14 07:44:00,513 INFO method: multiget_slice
> > 2010-01-14 07:44:00,513 INFO {'keys':
> >
> [u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a93ec971e867b23664d990336ce481e0:7516fd43adaa5e0b8a65a672c39845d2',
> >
> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:fe33779b0db3213f7e354c8e22ad9939:4df200d45716195e86c09a94a54a0c7a',
> >
> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:71860c77c6745379b0d44304d66b6a13:e37f0136aa3ffaf149b351f6a4c948e9',
> >
> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:1240f61999709d41292f759e500ad5be:69691c7bdcc3ce6d5d8a1361f22d04ac',
> >
> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a6d5b5c3d715b79b59caf7aed18301ac:b53b3a3d6ab90ce0268229151c9bde11'],
> > 'column_parent': ColumnParent(column_family='DistinctIndex',
> > super_column=None), 'predicate': SlicePredicate(column_names=None,
> > slice_range=SliceRange(count=14000, start='date_2009-07-01',
> reversed=False,
> > finish='date_2010-01-14'))}
> >
> > 2010-01-14 07:44:00,513 INFO result:
> >
> > 2010-01-14 07:44:00,513 INFO
> >
> {'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:fe33779b0db3213f7e354c8e22ad9939:4df200d45716195e86c09a94a54a0c7a':
> > [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
> > name='date_2010-01-11', value='1'), super_column=None),
> > ColumnOrSuperColumn(column=Column(timestamp=1263333256,
> > name='date_2010-01-12', value='1'), super_column=None),
> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
> > name='date_2010-01-13', value='1'), super_column=None),
> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
> > name='date_2010-01-14', value='1'), super_column=None)],
> >
> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:71860c77c6745379b0d44304d66b6a13:e37f0136aa3ffaf149b351f6a4c948e9':
> > [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
> > name='date_2010-01-11', value='1'), super_column=None),
> > ColumnOrSuperColumn(column=Column(timestamp=1263333256,
> > name='date_2010-01-12', value='1'), super_column=None),
> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
> > name='date_2010-01-13', value='1'), super_column=None),
> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
> > name='date_2010-01-14', value='1'), super_column=None)],
> >
> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a6d5b5c3d715b79b59caf7aed18301ac:b53b3a3d6ab90ce0268229151c9bde11':
> > [ColumnOrSuperColumn(column=Column(timestamp=1263333256,
> > name='date_2010-01-12', value='1'), super_column=None),
> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
> > name='date_2010-01-13', value='1'), super_column=None),
> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
> > name='date_2010-01-14', value='1'), super_column=None)],
> >
> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a93ec971e867b23664d990336ce481e0:7516fd43adaa5e0b8a65a672c39845d2':
> > [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
> > name='date_2010-01-11', value='1'), super_column=None),
> > ColumnOrSuperColumn(column=Column(timestamp=1263333256,
> > name='date_2010-01-12', value='1'), super_column=None),
> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
> > name='date_2010-01-13', value='1'), super_column=None),
> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
> > name='date_2010-01-14', value='1'), super_column=None)],
> >
> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:1240f61999709d41292f759e500ad5be:69691c7bdcc3ce6d5d8a1361f22d04ac':
> > [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
> > name='date_2010-01-11', value='1'), super_column=None),
> > ColumnOrSuperColumn(column=Column(timestamp=1263333256,
> > name='date_2010-01-12', value='1'), super_column=None),
> > ColumnOrSuperColumn(column=Column(timestamp=1263418556,
> > name='date_2010-01-13', value='1'), super_column=None),
> > ColumnOrSuperColumn(column=Column(timestamp=1263451804,
> > name='date_2010-01-14', value='1'), super_column=None)]}
> >
> >
> > The delay is the time at which it took to run the query and then return a
> > result. The box has 4GB of RAM and the *JVM_MAX_MEM (-Xmx) is set at 3G*.
> If
> > you're curious how I am running it:
> >
> > /usr/bin/jsvc -home /usr/lib/jvm/java-6-openjdk/jre -pidfile
> > /var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
> > -cp
> >
> /usr/share/cassandra/antlr-3.1.3.jar:/usr/share/cassandra/apache-cassandra-incubating-0.5.0-rc1.jar:/usr/share/cassandra/apache-cassandra-incubating.jar:/usr/share/cassandra/clhm-production.jar:/usr/share/cassandra/commons-cli-1.1.jar:/usr/share/cassandra/commons-collections-3.2.1.jar:/usr/share/cassandra/commons-lang-2.4.jar:/usr/share/cassandra/google-collect-1.0-rc1.jar:/usr/share/cassandra/high-scale-lib.jar:/usr/share/cassandra/jline-0.9.94.jar:/usr/share/cassandra/json_simple-1.1.jar:/usr/share/cassandra/junit-4.6.jar:/usr/share/cassandra/libthrift-r894924.jar:/usr/share/cassandra/log4j-1.2.15.jar:/usr/share/cassandra/slf4j-api-1.5.8.jar:/usr/share/cassandra/slf4j-log4j12-1.5.8.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
> > -Xmx3G -Xms128M -Dcassandra -Dstorage-config=/etc/cassandra
> > -Dcom.sun.management.jmxremote.port=8080
> > -Dcom.sun.management.jmxremote.ssl=false
> > -Dcom.sun.management.jmxremote.authenticate=false
> > org.apache.cassandra.service.CassandraDaemon
> >
> > I am running version* 0.5.0rc2*.
> >
> > Does anyone know what the bottleneck might be and how reads using
> > multiget_slice can be sped up? When I look at the memory used, it's only
> > about 1700 MB used. The box is not excessively swapping, running
> > iostat--everything seemed pretty okay.
> >
> > Suhail
> >
>



-- 
http://mixpanel.com
Blog: http://blog.mixpanel.com

Re: multiget_slice

Posted by Jonathan Ellis <jb...@gmail.com>.
how many keys are you fetching?  how many columns for each key?

On Thu, Jan 14, 2010 at 1:49 AM, Suhail Doshi <su...@mixpanel.com> wrote:
> I've been seeing multiget_slice take an extremely long time:
>
> 2010-01-14 07:44:00,513 INFO ------------------ Cassandra, delay:
> 3.64020800591 -----------------------
> 2010-01-14 07:44:00,513 INFO method: multiget_slice
> 2010-01-14 07:44:00,513 INFO {'keys':
> [u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a93ec971e867b23664d990336ce481e0:7516fd43adaa5e0b8a65a672c39845d2',
> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:fe33779b0db3213f7e354c8e22ad9939:4df200d45716195e86c09a94a54a0c7a',
> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:71860c77c6745379b0d44304d66b6a13:e37f0136aa3ffaf149b351f6a4c948e9',
> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:1240f61999709d41292f759e500ad5be:69691c7bdcc3ce6d5d8a1361f22d04ac',
> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a6d5b5c3d715b79b59caf7aed18301ac:b53b3a3d6ab90ce0268229151c9bde11'],
> 'column_parent': ColumnParent(column_family='DistinctIndex',
> super_column=None), 'predicate': SlicePredicate(column_names=None,
> slice_range=SliceRange(count=14000, start='date_2009-07-01', reversed=False,
> finish='date_2010-01-14'))}
>
> 2010-01-14 07:44:00,513 INFO result:
>
> 2010-01-14 07:44:00,513 INFO
> {'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:fe33779b0db3213f7e354c8e22ad9939:4df200d45716195e86c09a94a54a0c7a':
> [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
> name='date_2010-01-11', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263333256,
> name='date_2010-01-12', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263418556,
> name='date_2010-01-13', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263451804,
> name='date_2010-01-14', value='1'), super_column=None)],
> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:71860c77c6745379b0d44304d66b6a13:e37f0136aa3ffaf149b351f6a4c948e9':
> [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
> name='date_2010-01-11', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263333256,
> name='date_2010-01-12', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263418556,
> name='date_2010-01-13', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263451804,
> name='date_2010-01-14', value='1'), super_column=None)],
> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a6d5b5c3d715b79b59caf7aed18301ac:b53b3a3d6ab90ce0268229151c9bde11':
> [ColumnOrSuperColumn(column=Column(timestamp=1263333256,
> name='date_2010-01-12', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263418556,
> name='date_2010-01-13', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263451804,
> name='date_2010-01-14', value='1'), super_column=None)],
> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a93ec971e867b23664d990336ce481e0:7516fd43adaa5e0b8a65a672c39845d2':
> [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
> name='date_2010-01-11', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263333256,
> name='date_2010-01-12', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263418556,
> name='date_2010-01-13', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263451804,
> name='date_2010-01-14', value='1'), super_column=None)],
> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:1240f61999709d41292f759e500ad5be:69691c7bdcc3ce6d5d8a1361f22d04ac':
> [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
> name='date_2010-01-11', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263333256,
> name='date_2010-01-12', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263418556,
> name='date_2010-01-13', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263451804,
> name='date_2010-01-14', value='1'), super_column=None)]}
>
>
> The delay is the time at which it took to run the query and then return a
> result. The box has 4GB of RAM and the *JVM_MAX_MEM (-Xmx) is set at 3G*. If
> you're curious how I am running it:
>
> /usr/bin/jsvc -home /usr/lib/jvm/java-6-openjdk/jre -pidfile
> /var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
> -cp
> /usr/share/cassandra/antlr-3.1.3.jar:/usr/share/cassandra/apache-cassandra-incubating-0.5.0-rc1.jar:/usr/share/cassandra/apache-cassandra-incubating.jar:/usr/share/cassandra/clhm-production.jar:/usr/share/cassandra/commons-cli-1.1.jar:/usr/share/cassandra/commons-collections-3.2.1.jar:/usr/share/cassandra/commons-lang-2.4.jar:/usr/share/cassandra/google-collect-1.0-rc1.jar:/usr/share/cassandra/high-scale-lib.jar:/usr/share/cassandra/jline-0.9.94.jar:/usr/share/cassandra/json_simple-1.1.jar:/usr/share/cassandra/junit-4.6.jar:/usr/share/cassandra/libthrift-r894924.jar:/usr/share/cassandra/log4j-1.2.15.jar:/usr/share/cassandra/slf4j-api-1.5.8.jar:/usr/share/cassandra/slf4j-log4j12-1.5.8.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
> -Xmx3G -Xms128M -Dcassandra -Dstorage-config=/etc/cassandra
> -Dcom.sun.management.jmxremote.port=8080
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> org.apache.cassandra.service.CassandraDaemon
>
> I am running version* 0.5.0rc2*.
>
> Does anyone know what the bottleneck might be and how reads using
> multiget_slice can be sped up? When I look at the memory used, it's only
> about 1700 MB used. The box is not excessively swapping, running
> iostat--everything seemed pretty okay.
>
> Suhail
>

Re: multiget_slice

Posted by Suhail Doshi <su...@mixpanel.com>.
Also I played around with SlicedBufferSizeInKB but didn't notice any drastic
change one way or another in the short time that I used it. I had increased
it to 256.

On Thu, Jan 14, 2010 at 12:49 AM, Suhail Doshi <su...@mixpanel.com> wrote:

> I've been seeing multiget_slice take an extremely long time:
>
> 2010-01-14 07:44:00,513 INFO ------------------ Cassandra, delay:
> 3.64020800591 -----------------------
> 2010-01-14 07:44:00,513 INFO method: multiget_slice
> 2010-01-14 07:44:00,513 INFO {'keys':
> [u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a93ec971e867b23664d990336ce481e0:7516fd43adaa5e0b8a65a672c39845d2',
> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:fe33779b0db3213f7e354c8e22ad9939:4df200d45716195e86c09a94a54a0c7a',
> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:71860c77c6745379b0d44304d66b6a13:e37f0136aa3ffaf149b351f6a4c948e9',
> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:1240f61999709d41292f759e500ad5be:69691c7bdcc3ce6d5d8a1361f22d04ac',
> u'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a6d5b5c3d715b79b59caf7aed18301ac:b53b3a3d6ab90ce0268229151c9bde11'],
> 'column_parent': ColumnParent(column_family='DistinctIndex',
> super_column=None), 'predicate': SlicePredicate(column_names=None,
> slice_range=SliceRange(count=14000, start='date_2009-07-01', reversed=False,
> finish='date_2010-01-14'))}
>
> 2010-01-14 07:44:00,513 INFO result:
>
> 2010-01-14 07:44:00,513 INFO
> {'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:fe33779b0db3213f7e354c8e22ad9939:4df200d45716195e86c09a94a54a0c7a':
> [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
> name='date_2010-01-11', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263333256,
> name='date_2010-01-12', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263418556,
> name='date_2010-01-13', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263451804,
> name='date_2010-01-14', value='1'), super_column=None)],
> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:71860c77c6745379b0d44304d66b6a13:e37f0136aa3ffaf149b351f6a4c948e9':
> [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
> name='date_2010-01-11', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263333256,
> name='date_2010-01-12', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263418556,
> name='date_2010-01-13', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263451804,
> name='date_2010-01-14', value='1'), super_column=None)],
> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a6d5b5c3d715b79b59caf7aed18301ac:b53b3a3d6ab90ce0268229151c9bde11':
> [ColumnOrSuperColumn(column=Column(timestamp=1263333256,
> name='date_2010-01-12', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263418556,
> name='date_2010-01-13', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263451804,
> name='date_2010-01-14', value='1'), super_column=None)],
> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:a93ec971e867b23664d990336ce481e0:7516fd43adaa5e0b8a65a672c39845d2':
> [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
> name='date_2010-01-11', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263333256,
> name='date_2010-01-12', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263418556,
> name='date_2010-01-13', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263451804,
> name='date_2010-01-14', value='1'), super_column=None)],
> 'property:1558:1f0351b7f85b4aa070548e5fd5e08ddf:fce1eab4411d5df240d93ff334f15385:1240f61999709d41292f759e500ad5be:69691c7bdcc3ce6d5d8a1361f22d04ac':
> [ColumnOrSuperColumn(column=Column(timestamp=1263231323,
> name='date_2010-01-11', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263333256,
> name='date_2010-01-12', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263418556,
> name='date_2010-01-13', value='1'), super_column=None),
> ColumnOrSuperColumn(column=Column(timestamp=1263451804,
> name='date_2010-01-14', value='1'), super_column=None)]}
>
>
> The delay is the time at which it took to run the query and then return a
> result. The box has 4GB of RAM and the *JVM_MAX_MEM (-Xmx) is set at 3G*.
> If you're curious how I am running it:
>
> /usr/bin/jsvc -home /usr/lib/jvm/java-6-openjdk/jre -pidfile
> /var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
> -cp
> /usr/share/cassandra/antlr-3.1.3.jar:/usr/share/cassandra/apache-cassandra-incubating-0.5.0-rc1.jar:/usr/share/cassandra/apache-cassandra-incubating.jar:/usr/share/cassandra/clhm-production.jar:/usr/share/cassandra/commons-cli-1.1.jar:/usr/share/cassandra/commons-collections-3.2.1.jar:/usr/share/cassandra/commons-lang-2.4.jar:/usr/share/cassandra/google-collect-1.0-rc1.jar:/usr/share/cassandra/high-scale-lib.jar:/usr/share/cassandra/jline-0.9.94.jar:/usr/share/cassandra/json_simple-1.1.jar:/usr/share/cassandra/junit-4.6.jar:/usr/share/cassandra/libthrift-r894924.jar:/usr/share/cassandra/log4j-1.2.15.jar:/usr/share/cassandra/slf4j-api-1.5.8.jar:/usr/share/cassandra/slf4j-log4j12-1.5.8.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
> -Xmx3G -Xms128M -Dcassandra -Dstorage-config=/etc/cassandra
> -Dcom.sun.management.jmxremote.port=8080
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> org.apache.cassandra.service.CassandraDaemon
>
> I am running version* 0.5.0rc2*.
>
> Does anyone know what the bottleneck might be and how reads using
> multiget_slice can be sped up? When I look at the memory used, it's only
> about 1700 MB used. The box is not excessively swapping, running
> iostat--everything seemed pretty okay.
>
> Suhail
>



-- 
http://mixpanel.com
Blog: http://blog.mixpanel.com