You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Dathan Pattishall <da...@gmail.com> on 2010/07/21 05:59:06 UTC

what causes a cassandra to block and throw a null exception

Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.
cassandra> connect cass01/9160
cassandra> get TimeFrameClicks.Standard2['test_cassandra_alive']
Exception null

The data exists and I can grab the data after I restart all the nodes,
but once the cluster runs for a few minutes I cannot grab this
specific key or random other keys. It takes about  3 seconds until the
Exception null message. My storage-conf.xml is very simple:

<Keyspace Name="TimeFrameClicks">
  <ColumnFamily Name="Standard2" CompareWith="UTF8Type" />
....

Now my data is very small like 20 GB across 4 servers. Writes
consistently remain fast, yet reads fail like crazy. I hope its
something that I am doing wrong because

nodetool cfstats

says that the read latency for the keyspace and this specific column
family is less then 0.3 ms which means that something is lying to me.

To head off some questions:

CPU utilization is very little.
There is hardly any I/O on the box
The servers are all the same class enterprise boxes
There is 12 GB of ram per server
Each Server uses a local RAID.
Nothing in any of the system logs that indicates there any problem.

Additionally is there a stat or series of stats that I can lookup to
determine the health of read performance.

Re: what causes a cassandra to block and throw a null exception

Posted by Dathan Pattishall <da...@gmail.com>.
Just sent one of the nodes back.

Pool Name                    Active   Pending      Completed
STREAM-STAGE                      0         0              0
RESPONSE-STAGE                    0         0         151071
ROW-READ-STAGE                    0         0         100398
LB-OPERATIONS                     0         0              0
MESSAGE-DESERIALIZER-POOL         0         0         281268
GMFD                              0         0            935
LB-TARGET                         0         0              0
CONSISTENCY-MANAGER               0         0          59545
ROW-MUTATION-STAGE                0         0          71453
MESSAGE-STREAMING-POOL            0         0              0
LOAD-BALANCER-STAGE               0         0              0
FLUSH-SORTER-POOL                 0         0              0
MEMTABLE-POST-FLUSHER             0         0              1
FLUSH-WRITER-POOL                 0         0              1
AE-SERVICE-STAGE                  0         0              0
HINTED-HANDOFF-POOL               0         0              3


On Tue, Jul 20, 2010 at 9:03 PM, Chris Goffinet <cg...@chrisgoffinet.com>
wrote:
> Can you provide the output from `nodetool tpstats`.
>
> -Chris
>
> On Jul 20, 2010, at 8:59 PM, Dathan Pattishall wrote:
>
>> Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.
>> cassandra> connect cass01/9160
>> cassandra> get TimeFrameClicks.Standard2['test_cassandra_alive']
>> Exception null
>>
>> The data exists and I can grab the data after I restart all the nodes,
>> but once the cluster runs for a few minutes I cannot grab this
>> specific key or random other keys. It takes about  3 seconds until the
>> Exception null message. My storage-conf.xml is very simple:
>>
>> <Keyspace Name="TimeFrameClicks">
>>  <ColumnFamily Name="Standard2" CompareWith="UTF8Type" />
>> ....
>>
>> Now my data is very small like 20 GB across 4 servers. Writes
>> consistently remain fast, yet reads fail like crazy. I hope its
>> something that I am doing wrong because
>>
>> nodetool cfstats
>>
>> says that the read latency for the keyspace and this specific column
>> family is less then 0.3 ms which means that something is lying to me.
>>
>> To head off some questions:
>>
>> CPU utilization is very little.
>> There is hardly any I/O on the box
>> The servers are all the same class enterprise boxes
>> There is 12 GB of ram per server
>> Each Server uses a local RAID.
>> Nothing in any of the system logs that indicates there any problem.
>>
>> Additionally is there a stat or series of stats that I can lookup to
>> determine the health of read performance.
>
>

Re: what causes a cassandra to block and throw a null exception

Posted by Chris Goffinet <cg...@chrisgoffinet.com>.
Can you provide the output from `nodetool tpstats`.

-Chris

On Jul 20, 2010, at 8:59 PM, Dathan Pattishall wrote:

> Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.
> cassandra> connect cass01/9160
> cassandra> get TimeFrameClicks.Standard2['test_cassandra_alive']
> Exception null
> 
> The data exists and I can grab the data after I restart all the nodes,
> but once the cluster runs for a few minutes I cannot grab this
> specific key or random other keys. It takes about  3 seconds until the
> Exception null message. My storage-conf.xml is very simple:
> 
> <Keyspace Name="TimeFrameClicks">
>  <ColumnFamily Name="Standard2" CompareWith="UTF8Type" />
> ....
> 
> Now my data is very small like 20 GB across 4 servers. Writes
> consistently remain fast, yet reads fail like crazy. I hope its
> something that I am doing wrong because
> 
> nodetool cfstats
> 
> says that the read latency for the keyspace and this specific column
> family is less then 0.3 ms which means that something is lying to me.
> 
> To head off some questions:
> 
> CPU utilization is very little.
> There is hardly any I/O on the box
> The servers are all the same class enterprise boxes
> There is 12 GB of ram per server
> Each Server uses a local RAID.
> Nothing in any of the system logs that indicates there any problem.
> 
> Additionally is there a stat or series of stats that I can lookup to
> determine the health of read performance.