You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jacob Rhoden <ja...@me.com> on 2014/02/17 02:01:42 UTC

Where to I start to get to the bottom of this WriteTimeout issue?

Hi Guys,

While doing a bulk load of data, incrementing counters based on log data, I am encountering a timeout exception. Im using the Datastax 2.0-rc2 java driver. The driver is set up with both nodes as contact points. The keyspace is setup with replication factor = 2.

Is this an issue with the server not being able to keep up (ie server side config issue), or is this an issue with the client getting bored waiting for the server to respond? Either way, is there an easy config change I can make that would fix this?

    Exception in thread "main" com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write)
    	at com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:54)
    	at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:271)
    	at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(ResultSetFuture.java:187)
    	at com.datastax.driver.core.Session.execute(Session.java:126)
    	at jason.Stats.analyseLogMessages(Stats.java:91)
    	at jason.Stats.main(Stats.java:48)
    Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write)
    	at com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:54)
    	at com.datastax.driver.core.Responses$Error.asException(Responses.java:92)
    	at com.datastax.driver.core.ResultSetFuture$ResponseCallback.onSet(ResultSetFuture.java:122)
    	at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:224)
    	at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:373)
    	at com.datastax.driver.core.Connection$Dispatcher.messageReceived(Connection.java:510)
    	at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    	at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    	at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70)
    	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    	at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    	at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
    	at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
    	at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
    	at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    	at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
    	at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
    	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
    	at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    	at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    	at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    	at java.lang.Thread.run(Thread.java:744)
    Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write)
    	at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:53)
    	at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:33)
    	at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:165)
    	at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66)
    	... 21 more

Thanks,
Jacob


Re: Where to I start to get to the bottom of this WriteTimeout issue?

Posted by Jacob Rhoden <ja...@me.com>.
Hi Guys,

On 17 Feb 2014, at 3:13 pm, Thunder Stumpges <th...@gmail.com> wrote:
> If you are looking for write throughput and running on a VM you could likely have IO issues with your virtual disks.. Best practices are to put the write ahead log on a separate disk from the data folder(s). Not sure if you have done this or what physical setup you have under the VM but I would also examine your IO while you are doing this. Is there other load on the system either read or write while this is happening?

Good points! This is probably it. If my understanding of those timeout config settings are correct (i.e. max 2 second write wait), then cassandra would be much more sensitive to times of VPS disk or CPU contention on a virtual machine. That is to say, MySQL doesn’t error out if your VPS freezes with CPU/Disk/Network contention for 2 seconds, but it seems that cassandra would. Is my understanding correct?

Best regards,
Jacob

Re: Where to I start to get to the bottom of this WriteTimeout issue?

Posted by Thunder Stumpges <th...@gmail.com>.
If you are looking for write throughput and running on a VM you could likely have IO issues with your virtual disks.. Best practices are to put the write ahead log on a separate disk from the data folder(s). Not sure if you have done this or what physical setup you have under the VM but I would also examine your IO while you are doing this. Is there other load on the system either read or write while this is happening?

-Thunder


> On Feb 16, 2014, at 8:04 PM, Erick Ramirez <er...@ramirez.com.au> wrote:
> 
> Jacob,
> 
> You are right in that increasing the timeout to 20,000ms (20 seconds) is a real concern as it just hides an underlying issue with your environment. Without additional information, I was suspecting that this could be due to the environment not being optimised.
> 
> These write timeouts can occur when the systems are under load or low on resources. My questioning around memory is leading to the fact that your system(s) may possibly be under load due to GC which points to JVM running out of memory.
> 
> Have a look at the logs as they will give you clues as to what is happening, and possibly the cause of the issue. And keep us posted. Thanks!
> 
> Cheers,
> Erick
> 
> 
> 
>> On Mon, Feb 17, 2014 at 1:41 PM, Jacob Rhoden <ja...@me.com> wrote:
>> Hi Erick,
>> 
>>> On 17 Feb 2014, at 1:19 pm, Erick Ramirez <er...@ramirez.com.au> wrote:
>>> Are you able to post log snippets around the time that the timeouts occur?
>>> 
>>> I have a suspicion you may be running out of heap memory and might need to tune your environment. The INFO entries in the log should indicate this.
>> 
>> Im kicking off the load and not watching it so I don’t have a timestamp to see where it occurred. After some mucking around I worked out that adding an extra zero to the following parameter on both nodes makes the problem has gone away:
>> 
>> write_request_timeout_in_ms: 20000
>> 
>> Whatever that parameter exactly controls, Im pretty sure I don’t want to keep a 20s write timeout :D but it allows my bulk loads to run for the time being.
>> 
>> The nodes are running on some test VM’s  with xmx/xms set at 1Gb. So are you assuming that bulk counter row adding/incrementing can cause memory issues? How much memory do you need to allocate before this category of problem would disappear?
>> 
>> Thanks,
>> Jacob
> 

Re: Where to I start to get to the bottom of this WriteTimeout issue?

Posted by Erick Ramirez <er...@ramirez.com.au>.
Jacob,

You are right in that increasing the timeout to 20,000ms (20 seconds) is a
real concern as it just hides an underlying issue with your environment.
Without additional information, I was suspecting that this could be due to
the environment not being optimised.

These write timeouts can occur when the systems are under load or low on
resources. My questioning around memory is leading to the fact that your
system(s) may possibly be under load due to GC which points to JVM running
out of memory.

Have a look at the logs as they will give you clues as to what is
happening, and possibly the cause of the issue. And keep us posted. Thanks!

Cheers,
Erick



On Mon, Feb 17, 2014 at 1:41 PM, Jacob Rhoden <ja...@me.com> wrote:

> Hi Erick,
>
> On 17 Feb 2014, at 1:19 pm, Erick Ramirez <er...@ramirez.com.au> wrote:
>
> Are you able to post log snippets around the time that the timeouts occur?
>
> I have a suspicion you may be running out of heap memory and might need to
> tune your environment. The INFO entries in the log should indicate this.
>
>
> Im kicking off the load and not watching it so I don't have a timestamp to
> see where it occurred. After some mucking around I worked out that adding
> an extra zero to the following parameter on both nodes makes the problem
> has gone away:
>
> write_request_timeout_in_ms: 20000
>
> Whatever that parameter exactly controls, Im pretty sure I don't want to
> keep a 20s write timeout :D but it allows my bulk loads to run for the time
> being.
>
> The nodes are running on some test VM's  with xmx/xms set at 1Gb. So are
> you assuming that bulk counter row adding/incrementing can cause memory
> issues? How much memory do you need to allocate before this category of
> problem would disappear?
>
> Thanks,
> Jacob
>

Re: Where to I start to get to the bottom of this WriteTimeout issue?

Posted by Jacob Rhoden <ja...@me.com>.
Hi Erick,

On 17 Feb 2014, at 1:19 pm, Erick Ramirez <er...@ramirez.com.au> wrote:
> Are you able to post log snippets around the time that the timeouts occur?
> 
> I have a suspicion you may be running out of heap memory and might need to tune your environment. The INFO entries in the log should indicate this.

Im kicking off the load and not watching it so I don’t have a timestamp to see where it occurred. After some mucking around I worked out that adding an extra zero to the following parameter on both nodes makes the problem has gone away:

write_request_timeout_in_ms: 20000

Whatever that parameter exactly controls, Im pretty sure I don’t want to keep a 20s write timeout :D but it allows my bulk loads to run for the time being.

The nodes are running on some test VM’s  with xmx/xms set at 1Gb. So are you assuming that bulk counter row adding/incrementing can cause memory issues? How much memory do you need to allocate before this category of problem would disappear?

Thanks,
Jacob

Re: Where to I start to get to the bottom of this WriteTimeout issue?

Posted by Erick Ramirez <er...@ramirez.com.au>.
Jacob,

Are you able to post log snippets around the time that the timeouts occur?

I have a suspicion you may be running out of heap memory and might need to
tune your environment. The INFO entries in the log should indicate this.

Cheers,
Erick