You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Anthony Grasso <an...@gmail.com> on 2017/05/01 00:27:28 UTC

Re: Very slow cluster

Hi Eduardo,

Please see my comment inline below regarding your third question.

Regards,
Anthony

On 28 April 2017 at 21:26, Eduardo Alonso <ed...@stratio.com> wrote:

> Hi to all:
>
> I am having some problems with two client's cassandra:3.0.8 clusters i
> want to share with you. These clusters are for QA and DEV.
>
> The cluster 1 (1 DC) is composed by 3 vm (heap=4G, RAM=8G) sharing the
> same physical machine and sharing one ssd. I know this is not the best
> environment but it is only for testing purposes.
>
> The entire cluster runs very slow and sometimes have some failing inserts
> causing saving hints and replaying them and some data inconsistency with 2i
> queries.
>
> I know it is not the best environment (virtual machines sharing physical
> machine and one physical disk) but it is very weird to me that just the
> same test case works like a charm in a 3 docker container inside my
> laptop(i7 16G ssd) but causes a lot of problems in their cluster.
>
> *listen_address* and *rpc_address* are set to external domain name (i. e:
> NODE_NAME.clientdomain.com). I have activated TRACE logs and get some
> strange messages
>
> So, my questions:
>
> *1.- It is posible that one node(with ) send a message to self triggering
> READ_REPAIR?*
>
> TRACE [SharedPool-Worker-1] 2017-04-24 08:58:28,558
> MessagingService.java:750 - Message-to-self TYPE:MUTATION VERB:READ_REPAIR going
> over MessagingService
>
>     TRACE [SharedPool-Worker-1] 2017-04-16 04:38:47,513
> MessagingService.java:747 -01a.clientdomain.com/10.63.24.238
> <http://qathcsdvm01c.ny3.corp.portware.net/10.63.24.238> sending
> READ_REPAIR to 3426@/10.63.24.238"
>
> *Does this log line shows one node asking itself for a portion of data
> that it has not? *
>
> *2.-* I have another suspicious log line about slow vms:
>
> -WARN  [GossipTasks:1] 2017-04-14 00:32:44,371 FailureDetector.java:287 -
> Not marking nodes down due to local pause of 11195193520 > 5000000000
>
> *Does this line says that there is a pause in JVM  of 11 secs*? There is
> no garbage collector log lines. *Is it posible that this 11 secs pause is
> caused by a dns lookup of the domain?*
>
>
> *3.-* I know that listen_address must be the external IP (Inter node
> communications will be faster, no need to dns lookup)
>
> *If i set listen_address to external ip, is it necessary that ip be
> pingable from all the other datacenter nodes? *
> *Does inter-data-center communications use 'rpc_address' or
> 'listen_address'*?
>
>
All nodes in the cluster should be configured so that they can contact each
other. As far as being able to ping each other, enabling ICMP can be useful
for debugging inter communication problems.

Regarding internode communication; the *listen_address* is used for
internode communication in the cluster. Note that if you don't want to
manually specify an IP to *listen_address* for each node in your cluster,
leave it blank and Cassandra will use *InetAddress.getLocalHost()* to pick
an address.


> Thank you in advance
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Eduardo Alonso
> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // *@stratiobd
> <https://twitter.com/StratioBD>*
>

Re: Very slow cluster

Posted by Eduardo Alonso <ed...@stratio.com>.

Thank you Anthony.

Eduardo Alonso
Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
<https://twitter.com/StratioBD>*

2017-05-01 2:27 GMT+02:00 Anthony Grasso <an...@gmail.com>:

> Hi Eduardo,
>
> Please see my comment inline below regarding your third question.
>
> Regards,
> Anthony
>
> On 28 April 2017 at 21:26, Eduardo Alonso <ed...@stratio.com>
> wrote:
>
>> Hi to all:
>>
>> I am having some problems with two client's cassandra:3.0.8 clusters i
>> want to share with you. These clusters are for QA and DEV.
>>
>> The cluster 1 (1 DC) is composed by 3 vm (heap=4G, RAM=8G) sharing the
>> same physical machine and sharing one ssd. I know this is not the best
>> environment but it is only for testing purposes.
>>
>> The entire cluster runs very slow and sometimes have some failing inserts
>> causing saving hints and replaying them and some data inconsistency with 2i
>> queries.
>>
>> I know it is not the best environment (virtual machines sharing physical
>> machine and one physical disk) but it is very weird to me that just the
>> same test case works like a charm in a 3 docker container inside my
>> laptop(i7 16G ssd) but causes a lot of problems in their cluster.
>>
>> *listen_address* and *rpc_address* are set to external domain name (i.
>> e: NODE_NAME.clientdomain.com). I have activated TRACE logs and get some
>> strange messages
>>
>> So, my questions:
>>
>> *1.- It is posible that one node(with ) send a message to self triggering
>> READ_REPAIR?*
>>
>> TRACE [SharedPool-Worker-1] 2017-04-24 08:58:28,558
>> MessagingService.java:750 - Message-to-self TYPE:MUTATION VERB:
>> READ_REPAIR going over MessagingService
>>
>>     TRACE [SharedPool-Worker-1] 2017-04-16 04:38:47,513
>> MessagingService.java:747 -01a.clientdomain.com/10.63.24.238
>> <http://qathcsdvm01c.ny3.corp.portware.net/10.63.24.238> sending
>> READ_REPAIR to 3426@/10.63.24.238"
>>
>> *Does this log line shows one node asking itself for a portion of data
>> that it has not? *
>>
>> *2.-* I have another suspicious log line about slow vms:
>>
>> -WARN  [GossipTasks:1] 2017-04-14 00:32:44,371 FailureDetector.java:287
>> - Not marking nodes down due to local pause of 11195193520 > 5000000000
>>
>> *Does this line says that there is a pause in JVM  of 11 secs*? There is
>> no garbage collector log lines. *Is it posible that this 11 secs pause
>> is caused by a dns lookup of the domain?*
>>
>>
>> *3.-* I know that listen_address must be the external IP (Inter node
>> communications will be faster, no need to dns lookup)
>>
>> *If i set listen_address to external ip, is it necessary that ip be
>> pingable from all the other datacenter nodes? *
>> *Does inter-data-center communications use 'rpc_address' or
>> 'listen_address'*?
>>
>>
> All nodes in the cluster should be configured so that they can contact
> each other. As far as being able to ping each other, enabling ICMP can be
> useful for debugging inter communication problems.
>
> Regarding internode communication; the *listen_address* is used for
> internode communication in the cluster. Note that if you don't want to
> manually specify an IP to *listen_address* for each node in your cluster,
> leave it blank and Cassandra will use *InetAddress.getLocalHost()* to
> pick an address.
>
>
>> Thank you in advance
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Eduardo Alonso
>> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
>> 28224 Pozuelo de Alarcón, Madrid
>> Tel: +34 91 828 6473 <+34%20918%2028%2064%2073> // www.stratio.com // *@stratiobd
>> <https://twitter.com/StratioBD>*
>>
>
>