You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by ad...@panasiangroup.com on 2013/04/12 14:17:34 UTC
Repair hanges on 1.1.4
Hi,
I have started repair on newly added node with -pr and this nodes
exist on another data center. I have 5MB internet connection and
configured setstreamthroughput 1. After some time repair goes hang and
following meesage found in logs;
# /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost ring
Address DC Rack Status State Load
Effective-Ownership Token
169417178424467235000914166253263322299
10.0.0.3 DC1 RAC1 Up Normal 93.26 GB
66.67% 0
10.0.0.4 DC1 RAC1 Up Normal 89.1 GB
66.67% 56713727820156410577229101238628035242
10.0.0.15 DC1 RAC1 Up Normal 72.87 GB
66.67% 113427455640312821154458202477256070484
10.40.1.103 DC2 RAC1 Up Normal 48.59 GB
100.00% 169417178424467235000914166253263322299
INFO [HintedHandoff:1] 2013-04-12 17:05:49,411
HintedHandOffManager.java (line 372) Timed out replaying hints to
/10.40.1.103; aborting further deliveries
INFO [HintedHandoff:1] 2013-04-12 17:05:49,411
HintedHandOffManager.java (line 390) Finished hinted handoff of 0 rows
to endpoint /10.40.1.103
Why we getting this message and how I prevent repair from this error.
Regards,
Adeel Akbar
Re: Repair hanges on 1.1.4
Posted by aaron morton <aa...@thelastpickle.com>.
Looks like there are no repairs running. Is this just an issue with ops centre?
Try restarting the ops centre agent: sudo service opscenter-agent restart
Cheers
-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 18/04/2013, at 5:41 PM, adeel.akbar@panasiangroup.com wrote:
> Hi Aaron,
>
> Thank you for your feedback. I have also installed DataStax OPS center and its nothing shows progress of repair. Previously every repair progress also shown on OPS center and once it 100%, reapir also completed on nodes. but now reapir is in progress on node but OPS center nothing shows. Secondly please find netstats and compactionstats results as under;
>
> # /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost netstats
> Mode: NORMAL
> Not sending any streams.
> Not receiving any streams.
> Pool Name Active Pending Completed
> Commands n/a 0 5327870
> Responses n/a 0 163271943
>
> # /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost compactionstats
> pending tasks: 0
> Active compaction remaining time : n/a
>
> Regards,
>
> Adeel Akbar
>
> Quoting aaron morton <aa...@thelastpickle.com>:
>
>> The errors from Hints are not concerned with repair. Increasing the rpc_timeout may help with those. If it's logging about 0 hints you may be seeing this https://issues.apache.org/jira/browse/CASSANDRA-5068
>>
>> How did repair hang ? Check for progress with nodetool compactionstats and nodetool netstats.
>>
>> Cheers
>>
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 13/04/2013, at 3:01 AM, Alexis Rodríguez <ar...@inconcertcc.com> wrote:
>>
>>> Adeel,
>>>
>>> It may be a problem in the remote node, could you check the system.log?
>>>
>>> Also you might want to check the rpc_timeout_in_ms in both nodes, maybe an increase in this parameter helps.
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Apr 12, 2013 at 9:17 AM, <ad...@panasiangroup.com> wrote:
>>> Hi,
>>>
>>> I have started repair on newly added node with -pr and this nodes exist on another data center. I have 5MB internet connection and configured setstreamthroughput 1. After some time repair goes hang and following meesage found in logs;
>>>
>>> # /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost ring
>>> Address DC Rack Status State Load Effective-Ownership Token
>>> 169417178424467235000914166253263322299
>>> 10.0.0.3 DC1 RAC1 Up Normal 93.26 GB 66.67% 0
>>> 10.0.0.4 DC1 RAC1 Up Normal 89.1 GB 66.67% 56713727820156410577229101238628035242
>>> 10.0.0.15 DC1 RAC1 Up Normal 72.87 GB 66.67% 113427455640312821154458202477256070484
>>> 10.40.1.103 DC2 RAC1 Up Normal 48.59 GB 100.00% 169417178424467235000914166253263322299
>>>
>>>
>>> INFO [HintedHandoff:1] 2013-04-12 17:05:49,411 HintedHandOffManager.java (line 372) Timed out replaying hints to /10.40.1.103; aborting further deliveries
>>> INFO [HintedHandoff:1] 2013-04-12 17:05:49,411 HintedHandOffManager.java (line 390) Finished hinted handoff of 0 rows to endpoint /10.40.1.103
>>>
>>> Why we getting this message and how I prevent repair from this error.
>>>
>>> Regards,
>>>
>>> Adeel Akbar
>>>
>>
>>
>
Re: Repair hanges on 1.1.4
Posted by ad...@panasiangroup.com.
Hi Aaron,
Thank you for your feedback. I have also installed DataStax OPS center
and its nothing shows progress of repair. Previously every repair
progress also shown on OPS center and once it 100%, reapir also
completed on nodes. but now reapir is in progress on node but OPS
center nothing shows. Secondly please find netstats and
compactionstats results as under;
# /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost netstats
Mode: NORMAL
Not sending any streams.
Not receiving any streams.
Pool Name Active Pending Completed
Commands n/a 0 5327870
Responses n/a 0 163271943
# /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost compactionstats
pending tasks: 0
Active compaction remaining time : n/a
Regards,
Adeel Akbar
Quoting aaron morton <aa...@thelastpickle.com>:
> The errors from Hints are not concerned with repair. Increasing the
> rpc_timeout may help with those. If it's logging about 0 hints you
> may be seeing this
> https://issues.apache.org/jira/browse/CASSANDRA-5068
>
> How did repair hang ? Check for progress with nodetool
> compactionstats and nodetool netstats.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 13/04/2013, at 3:01 AM, Alexis Rodríguez
> <ar...@inconcertcc.com> wrote:
>
>> Adeel,
>>
>> It may be a problem in the remote node, could you check the system.log?
>>
>> Also you might want to check the rpc_timeout_in_ms in both nodes,
>> maybe an increase in this parameter helps.
>>
>>
>>
>>
>>
>> On Fri, Apr 12, 2013 at 9:17 AM, <ad...@panasiangroup.com> wrote:
>> Hi,
>>
>> I have started repair on newly added node with -pr and this nodes
>> exist on another data center. I have 5MB internet connection and
>> configured setstreamthroughput 1. After some time repair goes hang
>> and following meesage found in logs;
>>
>> # /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost ring
>> Address DC Rack Status State Load
>> Effective-Ownership Token
>>
>> 169417178424467235000914166253263322299
>> 10.0.0.3 DC1 RAC1 Up Normal 93.26 GB
>> 66.67% 0
>> 10.0.0.4 DC1 RAC1 Up Normal 89.1 GB
>> 66.67% 56713727820156410577229101238628035242
>> 10.0.0.15 DC1 RAC1 Up Normal 72.87 GB
>> 66.67% 113427455640312821154458202477256070484
>> 10.40.1.103 DC2 RAC1 Up Normal 48.59 GB
>> 100.00% 169417178424467235000914166253263322299
>>
>>
>> INFO [HintedHandoff:1] 2013-04-12 17:05:49,411
>> HintedHandOffManager.java (line 372) Timed out replaying hints to
>> /10.40.1.103; aborting further deliveries
>> INFO [HintedHandoff:1] 2013-04-12 17:05:49,411
>> HintedHandOffManager.java (line 390) Finished hinted handoff of 0
>> rows to endpoint /10.40.1.103
>>
>> Why we getting this message and how I prevent repair from this error.
>>
>> Regards,
>>
>> Adeel Akbar
>>
>
>
Re: Repair hanges on 1.1.4
Posted by aaron morton <aa...@thelastpickle.com>.
The errors from Hints are not concerned with repair. Increasing the rpc_timeout may help with those. If it's logging about 0 hints you may be seeing this https://issues.apache.org/jira/browse/CASSANDRA-5068
How did repair hang ? Check for progress with nodetool compactionstats and nodetool netstats.
Cheers
-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 13/04/2013, at 3:01 AM, Alexis Rodríguez <ar...@inconcertcc.com> wrote:
> Adeel,
>
> It may be a problem in the remote node, could you check the system.log?
>
> Also you might want to check the rpc_timeout_in_ms in both nodes, maybe an increase in this parameter helps.
>
>
>
>
>
> On Fri, Apr 12, 2013 at 9:17 AM, <ad...@panasiangroup.com> wrote:
> Hi,
>
> I have started repair on newly added node with -pr and this nodes exist on another data center. I have 5MB internet connection and configured setstreamthroughput 1. After some time repair goes hang and following meesage found in logs;
>
> # /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost ring
> Address DC Rack Status State Load Effective-Ownership Token
> 169417178424467235000914166253263322299
> 10.0.0.3 DC1 RAC1 Up Normal 93.26 GB 66.67% 0
> 10.0.0.4 DC1 RAC1 Up Normal 89.1 GB 66.67% 56713727820156410577229101238628035242
> 10.0.0.15 DC1 RAC1 Up Normal 72.87 GB 66.67% 113427455640312821154458202477256070484
> 10.40.1.103 DC2 RAC1 Up Normal 48.59 GB 100.00% 169417178424467235000914166253263322299
>
>
> INFO [HintedHandoff:1] 2013-04-12 17:05:49,411 HintedHandOffManager.java (line 372) Timed out replaying hints to /10.40.1.103; aborting further deliveries
> INFO [HintedHandoff:1] 2013-04-12 17:05:49,411 HintedHandOffManager.java (line 390) Finished hinted handoff of 0 rows to endpoint /10.40.1.103
>
> Why we getting this message and how I prevent repair from this error.
>
> Regards,
>
> Adeel Akbar
>
Re: Repair hanges on 1.1.4
Posted by Alexis Rodríguez <ar...@inconcertcc.com>.
Adeel,
It may be a problem in the remote node, could you check the system.log?
Also you might want to check the rpc_timeout_in_ms in both nodes, maybe an
increase in this parameter helps.
On Fri, Apr 12, 2013 at 9:17 AM, <ad...@panasiangroup.com> wrote:
> Hi,
>
> I have started repair on newly added node with -pr and this nodes exist on
> another data center. I have 5MB internet connection and configured
> setstreamthroughput 1. After some time repair goes hang and following
> meesage found in logs;
>
> # /opt/apache-cassandra-1.1.4/**bin/nodetool -h localhost ring
> Address DC Rack Status State Load
> Effective-Ownership Token
>
> 169417178424467235000914166253**263322299
> 10.0.0.3 DC1 RAC1 Up Normal 93.26 GB
> 66.67% 0
> 10.0.0.4 DC1 RAC1 Up Normal 89.1 GB
> 66.67% 567137278201564105772291012386**28035242
> 10.0.0.15 DC1 RAC1 Up Normal 72.87 GB
> 66.67% 113427455640312821154458202477**256070484
> 10.40.1.103 DC2 RAC1 Up Normal 48.59 GB
> 100.00% 169417178424467235000914166253**263322299
>
>
> INFO [HintedHandoff:1] 2013-04-12 17:05:49,411 HintedHandOffManager.java
> (line 372) Timed out replaying hints to /10.40.1.103; aborting further
> deliveries
> INFO [HintedHandoff:1] 2013-04-12 17:05:49,411 HintedHandOffManager.java
> (line 390) Finished hinted handoff of 0 rows to endpoint /10.40.1.103
>
> Why we getting this message and how I prevent repair from this error.
>
> Regards,
>
> Adeel Akbar
>