You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Shashi Yachavaram <sh...@gmail.com> on 2015/07/01 18:59:25 UTC

Experiencing Timeouts on one node

We have a 28 node cluster, out of which only one node is experiencing
timeouts.
We thought it was the raid, but there are two other nodes on the same raid
without
any problem. Also The problem goes away if we reboot the node, and then
reappears
after seven  days. The following hinted hand-off timeouts are seen on the
node
experiencing the timeouts. Also we did not notice any gossip errors.

I was wondering if anyone has seen this issue and how they resolved it.

Cassandra Version: 1.2.15.1
OS: Linux cm 2.6.32-504.8.1.el6.x86_64 #1 SMP Fri Dec 19 12:09:25 EST 2014
x86_64 x86_64 x86_64 GNU/Linux
java version "1.6.0_85"

------------------------------------------------------------------------------------------------------------------------------------
INFO [HintedHandoff:2] 2015-06-17 22:52:08,130 HintedHandOffManager.java
(line 296) Started hinted handoff for host:
4fe86051-6bca-4c28-b09c-1b0f073c1588 with IP: /192.168.1.122
 INFO [HintedHandoff:1] 2015-06-17 22:52:08,131 HintedHandOffManager.java
(line 296) Started hinted handoff for host:
bbf0878b-b405-4518-b649-f6cf7c9a6550 with IP: /192.168.1.119
 INFO [HintedHandoff:2] 2015-06-17 22:52:17,634 HintedHandOffManager.java
(line 422) Timed out replaying hints to /192.168.1.122; aborting (0
delivered)
 INFO [HintedHandoff:2] 2015-06-17 22:52:17,635 HintedHandOffManager.java
(line 296) Started hinted handoff for host:
f7b7ab10-4d42-4f0c-af92-2934a075bee3 with IP: /192.168.1.108
 INFO [HintedHandoff:1] 2015-06-17 22:52:17,643 HintedHandOffManager.java
(line 422) Timed out replaying hints to /192.168.1.119; aborting (0
delivered)
 INFO [HintedHandoff:1] 2015-06-17 22:52:17,643 HintedHandOffManager.java
(line 296) Started hinted handoff for host:
ddb79f35-3e2b-4be8-84d8-7942086e2b73 with IP: /192.168.1.104
 INFO [HintedHandoff:2] 2015-06-17 22:52:27,143 HintedHandOffManager.java
(line 422) Timed out replaying hints to /192.168.1.108; aborting (0
delivered)
 INFO [HintedHandoff:2] 2015-06-17 22:52:27,144 HintedHandOffManager.java
(line 296) Started hinted handoff for host:
6a2fa431-4a51-44cb-af19-1991c960e075 with IP: /192.168.1.117
 INFO [HintedHandoff:1] 2015-06-17 22:52:27,153 HintedHandOffManager.java
(line 422) Timed out replaying hints to /192.168.1.104; aborting (0
delivered)
 INFO [HintedHandoff:1] 2015-06-17 22:52:27,154 HintedHandOffManager.java
(line 296) Started hinted handoff for host:
cf03174a-533c-44d6-a679-e70090ad2bc5 with IP: /192.168.1.107
------------------------------------------------------------------------------------------------------------------------------------

Thanks
-shashi..

Re: Experiencing Timeouts on one node

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
Hi,

I am not sure about what is happening (I have never seen this error
before). Yet from
https://github.com/apache/cassandra/blob/cassandra-1.2/CHANGES.txt  it
looks like some bugs were fixed in late revision of 1.2.x.

I would advice you upgrading to last 1.2.19 (It is an old and stable
version, I see no reason not doing it).

"The problem goes away if we reboot the node, and then reappears after
seven days"
--> Have you TTLs on any table ? (Set to 7 days ?)
--> Do you see any GC warn / heap pressures ?

C*heers,

Alain



2015-07-02 16:20 GMT+02:00 Shashi Yachavaram <sh...@gmail.com>:

> Jason,
>
> The load was evenly distributed. And regarding network connectivity, our
> applications were successfully able to connect to the node, but the read
> and write operations were timing out. Also we were able to ssh to this
> node.
>
> I just pasted  "/bin/nodetool -h node version" and "java -version".
>
> Thanks
> shashi
>
> On Thu, Jul 2, 2015 at 8:42 AM, Jason Wee <pe...@gmail.com> wrote:
>
>> you should check the network connectivity for this node and also its
>> system average load. is that typo or literary what it is, cassandra
>> 1.2.15.*1* and java 6 update *85* ?
>>
>>
>>
>> On Thu, Jul 2, 2015 at 12:59 AM, Shashi Yachavaram <sh...@gmail.com>
>> wrote:
>>
>>> We have a 28 node cluster, out of which only one node is experiencing
>>> timeouts.
>>> We thought it was the raid, but there are two other nodes on the same
>>> raid without
>>> any problem. Also The problem goes away if we reboot the node, and then
>>> reappears
>>> after seven  days. The following hinted hand-off timeouts are seen on
>>> the node
>>> experiencing the timeouts. Also we did not notice any gossip errors.
>>>
>>> I was wondering if anyone has seen this issue and how they resolved it.
>>>
>>> Cassandra Version: 1.2.15.1
>>> OS: Linux cm 2.6.32-504.8.1.el6.x86_64 #1 SMP Fri Dec 19 12:09:25 EST
>>> 2014 x86_64 x86_64 x86_64 GNU/Linux
>>> java version "1.6.0_85"
>>>
>>>
>>> ------------------------------------------------------------------------------------------------------------------------------------
>>> INFO [HintedHandoff:2] 2015-06-17 22:52:08,130 HintedHandOffManager.java
>>> (line 296) Started hinted handoff for host:
>>> 4fe86051-6bca-4c28-b09c-1b0f073c1588 with IP: /192.168.1.122
>>>  INFO [HintedHandoff:1] 2015-06-17 22:52:08,131
>>> HintedHandOffManager.java (line 296) Started hinted handoff for host:
>>> bbf0878b-b405-4518-b649-f6cf7c9a6550 with IP: /192.168.1.119
>>>  INFO [HintedHandoff:2] 2015-06-17 22:52:17,634
>>> HintedHandOffManager.java (line 422) Timed out replaying hints to /
>>> 192.168.1.122; aborting (0 delivered)
>>>  INFO [HintedHandoff:2] 2015-06-17 22:52:17,635
>>> HintedHandOffManager.java (line 296) Started hinted handoff for host:
>>> f7b7ab10-4d42-4f0c-af92-2934a075bee3 with IP: /192.168.1.108
>>>  INFO [HintedHandoff:1] 2015-06-17 22:52:17,643
>>> HintedHandOffManager.java (line 422) Timed out replaying hints to /
>>> 192.168.1.119; aborting (0 delivered)
>>>  INFO [HintedHandoff:1] 2015-06-17 22:52:17,643
>>> HintedHandOffManager.java (line 296) Started hinted handoff for host:
>>> ddb79f35-3e2b-4be8-84d8-7942086e2b73 with IP: /192.168.1.104
>>>  INFO [HintedHandoff:2] 2015-06-17 22:52:27,143
>>> HintedHandOffManager.java (line 422) Timed out replaying hints to /
>>> 192.168.1.108; aborting (0 delivered)
>>>  INFO [HintedHandoff:2] 2015-06-17 22:52:27,144
>>> HintedHandOffManager.java (line 296) Started hinted handoff for host:
>>> 6a2fa431-4a51-44cb-af19-1991c960e075 with IP: /192.168.1.117
>>>  INFO [HintedHandoff:1] 2015-06-17 22:52:27,153
>>> HintedHandOffManager.java (line 422) Timed out replaying hints to /
>>> 192.168.1.104; aborting (0 delivered)
>>>  INFO [HintedHandoff:1] 2015-06-17 22:52:27,154
>>> HintedHandOffManager.java (line 296) Started hinted handoff for host:
>>> cf03174a-533c-44d6-a679-e70090ad2bc5 with IP: /192.168.1.107
>>>
>>> ------------------------------------------------------------------------------------------------------------------------------------
>>>
>>> Thanks
>>> -shashi..
>>>
>>
>>
>

Re: Experiencing Timeouts on one node

Posted by Shashi Yachavaram <sh...@gmail.com>.
Jason,

The load was evenly distributed. And regarding network connectivity, our
applications were successfully able to connect to the node, but the read
and write operations were timing out. Also we were able to ssh to this
node.

I just pasted  "/bin/nodetool -h node version" and "java -version".

Thanks
shashi

On Thu, Jul 2, 2015 at 8:42 AM, Jason Wee <pe...@gmail.com> wrote:

> you should check the network connectivity for this node and also its
> system average load. is that typo or literary what it is, cassandra
> 1.2.15.*1* and java 6 update *85* ?
>
>
>
> On Thu, Jul 2, 2015 at 12:59 AM, Shashi Yachavaram <sh...@gmail.com>
> wrote:
>
>> We have a 28 node cluster, out of which only one node is experiencing
>> timeouts.
>> We thought it was the raid, but there are two other nodes on the same
>> raid without
>> any problem. Also The problem goes away if we reboot the node, and then
>> reappears
>> after seven  days. The following hinted hand-off timeouts are seen on the
>> node
>> experiencing the timeouts. Also we did not notice any gossip errors.
>>
>> I was wondering if anyone has seen this issue and how they resolved it.
>>
>> Cassandra Version: 1.2.15.1
>> OS: Linux cm 2.6.32-504.8.1.el6.x86_64 #1 SMP Fri Dec 19 12:09:25 EST
>> 2014 x86_64 x86_64 x86_64 GNU/Linux
>> java version "1.6.0_85"
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------------------
>> INFO [HintedHandoff:2] 2015-06-17 22:52:08,130 HintedHandOffManager.java
>> (line 296) Started hinted handoff for host:
>> 4fe86051-6bca-4c28-b09c-1b0f073c1588 with IP: /192.168.1.122
>>  INFO [HintedHandoff:1] 2015-06-17 22:52:08,131 HintedHandOffManager.java
>> (line 296) Started hinted handoff for host:
>> bbf0878b-b405-4518-b649-f6cf7c9a6550 with IP: /192.168.1.119
>>  INFO [HintedHandoff:2] 2015-06-17 22:52:17,634 HintedHandOffManager.java
>> (line 422) Timed out replaying hints to /192.168.1.122; aborting (0
>> delivered)
>>  INFO [HintedHandoff:2] 2015-06-17 22:52:17,635 HintedHandOffManager.java
>> (line 296) Started hinted handoff for host:
>> f7b7ab10-4d42-4f0c-af92-2934a075bee3 with IP: /192.168.1.108
>>  INFO [HintedHandoff:1] 2015-06-17 22:52:17,643 HintedHandOffManager.java
>> (line 422) Timed out replaying hints to /192.168.1.119; aborting (0
>> delivered)
>>  INFO [HintedHandoff:1] 2015-06-17 22:52:17,643 HintedHandOffManager.java
>> (line 296) Started hinted handoff for host:
>> ddb79f35-3e2b-4be8-84d8-7942086e2b73 with IP: /192.168.1.104
>>  INFO [HintedHandoff:2] 2015-06-17 22:52:27,143 HintedHandOffManager.java
>> (line 422) Timed out replaying hints to /192.168.1.108; aborting (0
>> delivered)
>>  INFO [HintedHandoff:2] 2015-06-17 22:52:27,144 HintedHandOffManager.java
>> (line 296) Started hinted handoff for host:
>> 6a2fa431-4a51-44cb-af19-1991c960e075 with IP: /192.168.1.117
>>  INFO [HintedHandoff:1] 2015-06-17 22:52:27,153 HintedHandOffManager.java
>> (line 422) Timed out replaying hints to /192.168.1.104; aborting (0
>> delivered)
>>  INFO [HintedHandoff:1] 2015-06-17 22:52:27,154 HintedHandOffManager.java
>> (line 296) Started hinted handoff for host:
>> cf03174a-533c-44d6-a679-e70090ad2bc5 with IP: /192.168.1.107
>>
>> ------------------------------------------------------------------------------------------------------------------------------------
>>
>> Thanks
>> -shashi..
>>
>
>

Re: Experiencing Timeouts on one node

Posted by Jason Wee <pe...@gmail.com>.
you should check the network connectivity for this node and also its system
average load. is that typo or literary what it is, cassandra 1.2.15.*1* and
java 6 update *85* ?



On Thu, Jul 2, 2015 at 12:59 AM, Shashi Yachavaram <sh...@gmail.com>
wrote:

> We have a 28 node cluster, out of which only one node is experiencing
> timeouts.
> We thought it was the raid, but there are two other nodes on the same raid
> without
> any problem. Also The problem goes away if we reboot the node, and then
> reappears
> after seven  days. The following hinted hand-off timeouts are seen on the
> node
> experiencing the timeouts. Also we did not notice any gossip errors.
>
> I was wondering if anyone has seen this issue and how they resolved it.
>
> Cassandra Version: 1.2.15.1
> OS: Linux cm 2.6.32-504.8.1.el6.x86_64 #1 SMP Fri Dec 19 12:09:25 EST 2014
> x86_64 x86_64 x86_64 GNU/Linux
> java version "1.6.0_85"
>
>
> ------------------------------------------------------------------------------------------------------------------------------------
> INFO [HintedHandoff:2] 2015-06-17 22:52:08,130 HintedHandOffManager.java
> (line 296) Started hinted handoff for host:
> 4fe86051-6bca-4c28-b09c-1b0f073c1588 with IP: /192.168.1.122
>  INFO [HintedHandoff:1] 2015-06-17 22:52:08,131 HintedHandOffManager.java
> (line 296) Started hinted handoff for host:
> bbf0878b-b405-4518-b649-f6cf7c9a6550 with IP: /192.168.1.119
>  INFO [HintedHandoff:2] 2015-06-17 22:52:17,634 HintedHandOffManager.java
> (line 422) Timed out replaying hints to /192.168.1.122; aborting (0
> delivered)
>  INFO [HintedHandoff:2] 2015-06-17 22:52:17,635 HintedHandOffManager.java
> (line 296) Started hinted handoff for host:
> f7b7ab10-4d42-4f0c-af92-2934a075bee3 with IP: /192.168.1.108
>  INFO [HintedHandoff:1] 2015-06-17 22:52:17,643 HintedHandOffManager.java
> (line 422) Timed out replaying hints to /192.168.1.119; aborting (0
> delivered)
>  INFO [HintedHandoff:1] 2015-06-17 22:52:17,643 HintedHandOffManager.java
> (line 296) Started hinted handoff for host:
> ddb79f35-3e2b-4be8-84d8-7942086e2b73 with IP: /192.168.1.104
>  INFO [HintedHandoff:2] 2015-06-17 22:52:27,143 HintedHandOffManager.java
> (line 422) Timed out replaying hints to /192.168.1.108; aborting (0
> delivered)
>  INFO [HintedHandoff:2] 2015-06-17 22:52:27,144 HintedHandOffManager.java
> (line 296) Started hinted handoff for host:
> 6a2fa431-4a51-44cb-af19-1991c960e075 with IP: /192.168.1.117
>  INFO [HintedHandoff:1] 2015-06-17 22:52:27,153 HintedHandOffManager.java
> (line 422) Timed out replaying hints to /192.168.1.104; aborting (0
> delivered)
>  INFO [HintedHandoff:1] 2015-06-17 22:52:27,154 HintedHandOffManager.java
> (line 296) Started hinted handoff for host:
> cf03174a-533c-44d6-a679-e70090ad2bc5 with IP: /192.168.1.107
>
> ------------------------------------------------------------------------------------------------------------------------------------
>
> Thanks
> -shashi..
>

Re: Experiencing Timeouts on one node

Posted by Jason Wee <pe...@gmail.com>.
3. How do we rebuild System keyspace?

wipe this node and start it all over.

hth

jason

On Tue, Jul 7, 2015 at 12:16 AM, Shashi Yachavaram <sh...@gmail.com>
wrote:

> When we reboot the problematic node, we see the following errors in
> system.log.
>
> 1. Does this mean hints column family is corrupted?
> 2. Can we scrub system column family on problematic node and its
> replication partners?
> 3. How do we rebuild System keyspace?
>
> ==================================================================
> ERROR [CompactionExecutor:950] 2015-06-27 20:11:44,595
> CassandraDaemon.java (line 191) Exception in thread
> Thread[CompactionExecutor:950,1,main]
> java.lang.AssertionError: originally calculated column size of 8684 but
> now it is 15725
> at
> org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:135)
> at
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160)
> at
> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
> at
> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
> at
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
> at
> org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:442)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
> at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
> at java.util.concurrent.FutureTask.run(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> ERROR [HintedHandoff:552] 2015-06-27 20:11:44,595 CassandraDaemon.java
> (line 191) Exception in thread Thread[HintedHandoff:552,1,main]
> java.lang.RuntimeException: java.util.concurrent.ExecutionException:
> java.lang.AssertionError: originally calculated column size of 8684 but now
> it is 15725
> at
> org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:436)
> at
> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:282)
> at
> org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:90)
> at
> org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:502)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> Caused by: java.util.concurrent.ExecutionException:
> java.lang.AssertionError: originally calculated column size of 8684 but now
> it is 15725
> at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
> at java.util.concurrent.FutureTask.get(Unknown Source)
> at
> org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:432)
> ... 6 more
> Caused by: java.lang.AssertionError: originally calculated column size of
> 8684 but now it is 15725
> at
> org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:135)
> at
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160)
> at
> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
> at
> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
> at
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
> at
> org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:442)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
> at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
> at java.util.concurrent.FutureTask.run(Unknown Source)
> ==================================================================
>
>
> On Wed, Jul 1, 2015 at 11:59 AM, Shashi Yachavaram <sh...@gmail.com>
> wrote:
>
>> We have a 28 node cluster, out of which only one node is experiencing
>> timeouts.
>> We thought it was the raid, but there are two other nodes on the same
>> raid without
>> any problem. Also The problem goes away if we reboot the node, and then
>> reappears
>> after seven  days. The following hinted hand-off timeouts are seen on the
>> node
>> experiencing the timeouts. Also we did not notice any gossip errors.
>>
>> I was wondering if anyone has seen this issue and how they resolved it.
>>
>> Cassandra Version: 1.2.15.1
>> OS: Linux cm 2.6.32-504.8.1.el6.x86_64 #1 SMP Fri Dec 19 12:09:25 EST
>> 2014 x86_64 x86_64 x86_64 GNU/Linux
>> java version "1.6.0_85"
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------------------
>> INFO [HintedHandoff:2] 2015-06-17 22:52:08,130 HintedHandOffManager.java
>> (line 296) Started hinted handoff for host:
>> 4fe86051-6bca-4c28-b09c-1b0f073c1588 with IP: /192.168.1.122
>>  INFO [HintedHandoff:1] 2015-06-17 22:52:08,131 HintedHandOffManager.java
>> (line 296) Started hinted handoff for host:
>> bbf0878b-b405-4518-b649-f6cf7c9a6550 with IP: /192.168.1.119
>>  INFO [HintedHandoff:2] 2015-06-17 22:52:17,634 HintedHandOffManager.java
>> (line 422) Timed out replaying hints to /192.168.1.122; aborting (0
>> delivered)
>>  INFO [HintedHandoff:2] 2015-06-17 22:52:17,635 HintedHandOffManager.java
>> (line 296) Started hinted handoff for host:
>> f7b7ab10-4d42-4f0c-af92-2934a075bee3 with IP: /192.168.1.108
>>  INFO [HintedHandoff:1] 2015-06-17 22:52:17,643 HintedHandOffManager.java
>> (line 422) Timed out replaying hints to /192.168.1.119; aborting (0
>> delivered)
>>  INFO [HintedHandoff:1] 2015-06-17 22:52:17,643 HintedHandOffManager.java
>> (line 296) Started hinted handoff for host:
>> ddb79f35-3e2b-4be8-84d8-7942086e2b73 with IP: /192.168.1.104
>>  INFO [HintedHandoff:2] 2015-06-17 22:52:27,143 HintedHandOffManager.java
>> (line 422) Timed out replaying hints to /192.168.1.108; aborting (0
>> delivered)
>>  INFO [HintedHandoff:2] 2015-06-17 22:52:27,144 HintedHandOffManager.java
>> (line 296) Started hinted handoff for host:
>> 6a2fa431-4a51-44cb-af19-1991c960e075 with IP: /192.168.1.117
>>  INFO [HintedHandoff:1] 2015-06-17 22:52:27,153 HintedHandOffManager.java
>> (line 422) Timed out replaying hints to /192.168.1.104; aborting (0
>> delivered)
>>  INFO [HintedHandoff:1] 2015-06-17 22:52:27,154 HintedHandOffManager.java
>> (line 296) Started hinted handoff for host:
>> cf03174a-533c-44d6-a679-e70090ad2bc5 with IP: /192.168.1.107
>>
>> ------------------------------------------------------------------------------------------------------------------------------------
>>
>> Thanks
>> -shashi..
>>
>
>

Re: Experiencing Timeouts on one node

Posted by Shashi Yachavaram <sh...@gmail.com>.
When we reboot the problematic node, we see the following errors in
system.log.

1. Does this mean hints column family is corrupted?
2. Can we scrub system column family on problematic node and its
replication partners?
3. How do we rebuild System keyspace?

==================================================================
ERROR [CompactionExecutor:950] 2015-06-27 20:11:44,595 CassandraDaemon.java
(line 191) Exception in thread Thread[CompactionExecutor:950,1,main]
java.lang.AssertionError: originally calculated column size of 8684 but now
it is 15725
at
org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:135)
at
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160)
at
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
at
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
at
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
at
org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:442)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
ERROR [HintedHandoff:552] 2015-06-27 20:11:44,595 CassandraDaemon.java
(line 191) Exception in thread Thread[HintedHandoff:552,1,main]
java.lang.RuntimeException: java.util.concurrent.ExecutionException:
java.lang.AssertionError: originally calculated column size of 8684 but now
it is 15725
at
org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:436)
at
org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:282)
at
org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:90)
at
org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:502)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.util.concurrent.ExecutionException:
java.lang.AssertionError: originally calculated column size of 8684 but now
it is 15725
at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at
org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:432)
... 6 more
Caused by: java.lang.AssertionError: originally calculated column size of
8684 but now it is 15725
at
org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:135)
at
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160)
at
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
at
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
at
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
at
org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:442)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
==================================================================


On Wed, Jul 1, 2015 at 11:59 AM, Shashi Yachavaram <sh...@gmail.com>
wrote:

> We have a 28 node cluster, out of which only one node is experiencing
> timeouts.
> We thought it was the raid, but there are two other nodes on the same raid
> without
> any problem. Also The problem goes away if we reboot the node, and then
> reappears
> after seven  days. The following hinted hand-off timeouts are seen on the
> node
> experiencing the timeouts. Also we did not notice any gossip errors.
>
> I was wondering if anyone has seen this issue and how they resolved it.
>
> Cassandra Version: 1.2.15.1
> OS: Linux cm 2.6.32-504.8.1.el6.x86_64 #1 SMP Fri Dec 19 12:09:25 EST 2014
> x86_64 x86_64 x86_64 GNU/Linux
> java version "1.6.0_85"
>
>
> ------------------------------------------------------------------------------------------------------------------------------------
> INFO [HintedHandoff:2] 2015-06-17 22:52:08,130 HintedHandOffManager.java
> (line 296) Started hinted handoff for host:
> 4fe86051-6bca-4c28-b09c-1b0f073c1588 with IP: /192.168.1.122
>  INFO [HintedHandoff:1] 2015-06-17 22:52:08,131 HintedHandOffManager.java
> (line 296) Started hinted handoff for host:
> bbf0878b-b405-4518-b649-f6cf7c9a6550 with IP: /192.168.1.119
>  INFO [HintedHandoff:2] 2015-06-17 22:52:17,634 HintedHandOffManager.java
> (line 422) Timed out replaying hints to /192.168.1.122; aborting (0
> delivered)
>  INFO [HintedHandoff:2] 2015-06-17 22:52:17,635 HintedHandOffManager.java
> (line 296) Started hinted handoff for host:
> f7b7ab10-4d42-4f0c-af92-2934a075bee3 with IP: /192.168.1.108
>  INFO [HintedHandoff:1] 2015-06-17 22:52:17,643 HintedHandOffManager.java
> (line 422) Timed out replaying hints to /192.168.1.119; aborting (0
> delivered)
>  INFO [HintedHandoff:1] 2015-06-17 22:52:17,643 HintedHandOffManager.java
> (line 296) Started hinted handoff for host:
> ddb79f35-3e2b-4be8-84d8-7942086e2b73 with IP: /192.168.1.104
>  INFO [HintedHandoff:2] 2015-06-17 22:52:27,143 HintedHandOffManager.java
> (line 422) Timed out replaying hints to /192.168.1.108; aborting (0
> delivered)
>  INFO [HintedHandoff:2] 2015-06-17 22:52:27,144 HintedHandOffManager.java
> (line 296) Started hinted handoff for host:
> 6a2fa431-4a51-44cb-af19-1991c960e075 with IP: /192.168.1.117
>  INFO [HintedHandoff:1] 2015-06-17 22:52:27,153 HintedHandOffManager.java
> (line 422) Timed out replaying hints to /192.168.1.104; aborting (0
> delivered)
>  INFO [HintedHandoff:1] 2015-06-17 22:52:27,154 HintedHandOffManager.java
> (line 296) Started hinted handoff for host:
> cf03174a-533c-44d6-a679-e70090ad2bc5 with IP: /192.168.1.107
>
> ------------------------------------------------------------------------------------------------------------------------------------
>
> Thanks
> -shashi..
>