You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by ad...@panasiangroup.com on 2013/04/12 14:17:34 UTC

Repair hanges on 1.1.4

Hi,

I have started repair on newly added node with -pr and this nodes  
exist on another data center. I have 5MB internet connection and  
configured setstreamthroughput 1. After some time repair goes hang and  
following meesage found in logs;

# /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost ring
Address         DC          Rack        Status State   Load             
Effective-Ownership Token
                                                                        
                      169417178424467235000914166253263322299
10.0.0.3        DC1         RAC1        Up     Normal  93.26 GB         
66.67%              0
10.0.0.4        DC1         RAC1        Up     Normal  89.1 GB          
66.67%              56713727820156410577229101238628035242
10.0.0.15       DC1         RAC1        Up     Normal  72.87 GB         
66.67%              113427455640312821154458202477256070484
10.40.1.103     DC2         RAC1        Up     Normal  48.59 GB         
100.00%             169417178424467235000914166253263322299


  INFO [HintedHandoff:1] 2013-04-12 17:05:49,411  
HintedHandOffManager.java (line 372) Timed out replaying hints to  
/10.40.1.103; aborting further deliveries
  INFO [HintedHandoff:1] 2013-04-12 17:05:49,411  
HintedHandOffManager.java (line 390) Finished hinted handoff of 0 rows  
to endpoint /10.40.1.103

Why we getting this message and how I prevent repair from this error.

Regards,

Adeel Akbar

Re: Repair hanges on 1.1.4

Posted by aaron morton <aa...@thelastpickle.com>.

Looks like there are no repairs running. Is this just an issue with ops centre? 

Try restarting the ops centre agent: sudo service opscenter-agent restart

Cheers
 
-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/04/2013, at 5:41 PM, adeel.akbar@panasiangroup.com wrote:

> Hi Aaron,
> 
> Thank you for your feedback. I have also installed DataStax OPS center and its nothing shows progress of repair. Previously every repair progress also shown on OPS center and once it 100%, reapir also completed on nodes. but now reapir is in progress on node but OPS center nothing shows. Secondly please find netstats and compactionstats results as under;
> 
> # /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost netstats
> Mode: NORMAL
> Not sending any streams.
> Not receiving any streams.
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0        5327870
> Responses                       n/a         0      163271943
> 
> # /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost compactionstats
> pending tasks: 0
> Active compaction remaining time :        n/a
> 
> Regards,
> 
> Adeel Akbar
> 
> Quoting aaron morton <aa...@thelastpickle.com>:
> 
>> The errors from Hints are not concerned with repair. Increasing the  rpc_timeout may help with those. If it's logging about 0 hints you  may be seeing this  https://issues.apache.org/jira/browse/CASSANDRA-5068
>> 
>> How did repair hang ? Check for progress with nodetool  compactionstats and nodetool netstats.
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 13/04/2013, at 3:01 AM, Alexis Rodríguez  <ar...@inconcertcc.com> wrote:
>> 
>>> Adeel,
>>> 
>>> It may be a problem in the remote node, could you check the system.log?
>>> 
>>> Also you might want to check the rpc_timeout_in_ms in both nodes,  maybe an increase in this parameter helps.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Fri, Apr 12, 2013 at 9:17 AM, <ad...@panasiangroup.com> wrote:
>>> Hi,
>>> 
>>> I have started repair on newly added node with -pr and this nodes  exist on another data center. I have 5MB internet connection and  configured setstreamthroughput 1. After some time repair goes hang  and following meesage found in logs;
>>> 
>>> # /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost ring
>>> Address         DC          Rack        Status State   Load             Effective-Ownership Token
>>>                                                                                            169417178424467235000914166253263322299
>>> 10.0.0.3        DC1         RAC1        Up     Normal  93.26 GB         66.67%              0
>>> 10.0.0.4        DC1         RAC1        Up     Normal  89.1 GB          66.67%              56713727820156410577229101238628035242
>>> 10.0.0.15       DC1         RAC1        Up     Normal  72.87 GB         66.67%              113427455640312821154458202477256070484
>>> 10.40.1.103     DC2         RAC1        Up     Normal  48.59 GB         100.00%             169417178424467235000914166253263322299
>>> 
>>> 
>>> INFO [HintedHandoff:1] 2013-04-12 17:05:49,411  HintedHandOffManager.java (line 372) Timed out replaying hints to  /10.40.1.103; aborting further deliveries
>>> INFO [HintedHandoff:1] 2013-04-12 17:05:49,411  HintedHandOffManager.java (line 390) Finished hinted handoff of 0  rows to endpoint /10.40.1.103
>>> 
>>> Why we getting this message and how I prevent repair from this error.
>>> 
>>> Regards,
>>> 
>>> Adeel Akbar
>>> 
>> 
>> 
>

Re: Repair hanges on 1.1.4

Posted by ad...@panasiangroup.com.

Hi Aaron,

Thank you for your feedback. I have also installed DataStax OPS center  
and its nothing shows progress of repair. Previously every repair  
progress also shown on OPS center and once it 100%, reapir also  
completed on nodes. but now reapir is in progress on node but OPS  
center nothing shows. Secondly please find netstats and  
compactionstats results as under;

# /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost netstats
Mode: NORMAL
Not sending any streams.
Not receiving any streams.
Pool Name                    Active   Pending      Completed
Commands                        n/a         0        5327870
Responses                       n/a         0      163271943

# /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost compactionstats
pending tasks: 0
Active compaction remaining time :        n/a

Regards,

Adeel Akbar

Quoting aaron morton <aa...@thelastpickle.com>:

> The errors from Hints are not concerned with repair. Increasing the   
> rpc_timeout may help with those. If it's logging about 0 hints you   
> may be seeing this   
> https://issues.apache.org/jira/browse/CASSANDRA-5068
>
> How did repair hang ? Check for progress with nodetool   
> compactionstats and nodetool netstats.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 13/04/2013, at 3:01 AM, Alexis Rodríguez   
> <ar...@inconcertcc.com> wrote:
>
>> Adeel,
>>
>> It may be a problem in the remote node, could you check the system.log?
>>
>> Also you might want to check the rpc_timeout_in_ms in both nodes,   
>> maybe an increase in this parameter helps.
>>
>>
>>
>>
>>
>> On Fri, Apr 12, 2013 at 9:17 AM, <ad...@panasiangroup.com> wrote:
>> Hi,
>>
>> I have started repair on newly added node with -pr and this nodes   
>> exist on another data center. I have 5MB internet connection and   
>> configured setstreamthroughput 1. After some time repair goes hang   
>> and following meesage found in logs;
>>
>> # /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost ring
>> Address         DC          Rack        Status State   Load          
>>     Effective-Ownership Token
>>                                                                      
>>                         169417178424467235000914166253263322299
>> 10.0.0.3        DC1         RAC1        Up     Normal  93.26 GB      
>>     66.67%              0
>> 10.0.0.4        DC1         RAC1        Up     Normal  89.1 GB       
>>     66.67%              56713727820156410577229101238628035242
>> 10.0.0.15       DC1         RAC1        Up     Normal  72.87 GB      
>>     66.67%              113427455640312821154458202477256070484
>> 10.40.1.103     DC2         RAC1        Up     Normal  48.59 GB      
>>     100.00%             169417178424467235000914166253263322299
>>
>>
>>  INFO [HintedHandoff:1] 2013-04-12 17:05:49,411   
>> HintedHandOffManager.java (line 372) Timed out replaying hints to   
>> /10.40.1.103; aborting further deliveries
>>  INFO [HintedHandoff:1] 2013-04-12 17:05:49,411   
>> HintedHandOffManager.java (line 390) Finished hinted handoff of 0   
>> rows to endpoint /10.40.1.103
>>
>> Why we getting this message and how I prevent repair from this error.
>>
>> Regards,
>>
>> Adeel Akbar
>>
>
>

Re: Repair hanges on 1.1.4

Posted by aaron morton <aa...@thelastpickle.com>.

The errors from Hints are not concerned with repair. Increasing the rpc_timeout may help with those. If it's logging about 0 hints you may be seeing this https://issues.apache.org/jira/browse/CASSANDRA-5068

How did repair hang ? Check for progress with nodetool compactionstats and nodetool netstats. 

Cheers
 
-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 13/04/2013, at 3:01 AM, Alexis Rodríguez <ar...@inconcertcc.com> wrote:

> Adeel,
> 
> It may be a problem in the remote node, could you check the system.log?
> 
> Also you might want to check the rpc_timeout_in_ms in both nodes, maybe an increase in this parameter helps.
> 
> 
> 
> 
> 
> On Fri, Apr 12, 2013 at 9:17 AM, <ad...@panasiangroup.com> wrote:
> Hi,
> 
> I have started repair on newly added node with -pr and this nodes exist on another data center. I have 5MB internet connection and configured setstreamthroughput 1. After some time repair goes hang and following meesage found in logs;
> 
> # /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost ring
> Address         DC          Rack        Status State   Load            Effective-Ownership Token
>                                                                                            169417178424467235000914166253263322299
> 10.0.0.3        DC1         RAC1        Up     Normal  93.26 GB        66.67%              0
> 10.0.0.4        DC1         RAC1        Up     Normal  89.1 GB         66.67%              56713727820156410577229101238628035242
> 10.0.0.15       DC1         RAC1        Up     Normal  72.87 GB        66.67%              113427455640312821154458202477256070484
> 10.40.1.103     DC2         RAC1        Up     Normal  48.59 GB        100.00%             169417178424467235000914166253263322299
> 
> 
>  INFO [HintedHandoff:1] 2013-04-12 17:05:49,411 HintedHandOffManager.java (line 372) Timed out replaying hints to /10.40.1.103; aborting further deliveries
>  INFO [HintedHandoff:1] 2013-04-12 17:05:49,411 HintedHandOffManager.java (line 390) Finished hinted handoff of 0 rows to endpoint /10.40.1.103
> 
> Why we getting this message and how I prevent repair from this error.
> 
> Regards,
> 
> Adeel Akbar
>

Re: Repair hanges on 1.1.4

Posted by Alexis Rodríguez <ar...@inconcertcc.com>.

Adeel,

It may be a problem in the remote node, could you check the system.log?

Also you might want to check the rpc_timeout_in_ms in both nodes, maybe an
increase in this parameter helps.





On Fri, Apr 12, 2013 at 9:17 AM, <ad...@panasiangroup.com> wrote:

> Hi,
>
> I have started repair on newly added node with -pr and this nodes exist on
> another data center. I have 5MB internet connection and configured
> setstreamthroughput 1. After some time repair goes hang and following
> meesage found in logs;
>
> # /opt/apache-cassandra-1.1.4/**bin/nodetool -h localhost ring
> Address         DC          Rack        Status State   Load
>  Effective-Ownership Token
>
>                  169417178424467235000914166253**263322299
> 10.0.0.3        DC1         RAC1        Up     Normal  93.26 GB
>  66.67%              0
> 10.0.0.4        DC1         RAC1        Up     Normal  89.1 GB
> 66.67%              567137278201564105772291012386**28035242
> 10.0.0.15       DC1         RAC1        Up     Normal  72.87 GB
>  66.67%              113427455640312821154458202477**256070484
> 10.40.1.103     DC2         RAC1        Up     Normal  48.59 GB
>  100.00%             169417178424467235000914166253**263322299
>
>
>  INFO [HintedHandoff:1] 2013-04-12 17:05:49,411 HintedHandOffManager.java
> (line 372) Timed out replaying hints to /10.40.1.103; aborting further
> deliveries
>  INFO [HintedHandoff:1] 2013-04-12 17:05:49,411 HintedHandOffManager.java
> (line 390) Finished hinted handoff of 0 rows to endpoint /10.40.1.103
>
> Why we getting this message and how I prevent repair from this error.
>
> Regards,
>
> Adeel Akbar
>