You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Hiroyuki Yamada <mo...@gmail.com> on 2019/05/13 01:58:39 UTC

Re: A cluster (RF=3) not recovering after two nodes are stopped

Hi,

Should I post a bug ?
It doesn't seem to be an expected behavior,
so I think it should be at least documented somewhere.

Thanks,
Hiro


On Fri, Apr 26, 2019 at 3:17 PM Hiroyuki Yamada <mo...@gmail.com> wrote:

> Hello,
>
> Thank you for some feedbacks.
>
> >Ben
> Thank you.
> I've tested with lower concurrency in my side, the issue still occurs.
> We are using 3 x T3.xlarge instances for C* and small and separate
> instance for the client program.
> But if we tried with 1 host with 3 C* nodes, the issue didn't occur.
>
> > Alok
> We also thought so and tested with hints disabled, but it doesn't make any
> difference. (the issue still occurs)
>
> Thanks,
> Hiro
>
>
>
>
> On Fri, Apr 26, 2019 at 8:19 AM Alok Dwivedi <al...@instaclustr.com>
> wrote:
>
>> Could it be related to hinted hand offs being stored in Node1 and then
>> attempted to be replayed in Node2 when it comes back causing more load as
>> new mutations are also being applied from cassandra-stress at same time?
>>
>> Alok Dwivedi
>> Senior Consultant
>> https://www.instaclustr.com/
>>
>>
>>
>>
>> On 26 Apr 2019, at 09:04, Ben Slater <be...@instaclustr.com> wrote:
>>
>> In the absence of anyone else having any bright ideas - it still sounds
>> to me like the kind of scenario that can occur in a heavily overloaded
>> cluster. I would try again with a lower load.
>>
>> What size machines are you using for stress client and the nodes? Are
>> they all on separate machines?
>>
>> Cheers
>> Ben
>>
>> ---
>>
>>
>> *Ben Slater**Chief Product Officer*
>>
>> <https://www.instaclustr.com/platform/>
>>
>> <https://www.facebook.com/instaclustr>
>> <https://twitter.com/instaclustr>
>> <https://www.linkedin.com/company/instaclustr>
>>
>> Read our latest technical blog posts here
>> <https://www.instaclustr.com/blog/>.
>>
>> This email has been sent on behalf of Instaclustr Pty. Limited
>> (Australia) and Instaclustr Inc (USA).
>>
>> This email and any attachments may contain confidential and legally
>> privileged information.  If you are not the intended recipient, do not copy
>> or disclose its content, but please reply to this email immediately and
>> highlight the error to the sender and then immediately delete the message.
>>
>>
>> On Thu, 25 Apr 2019 at 17:26, Hiroyuki Yamada <mo...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> Sorry again.
>>> We found yet another weird thing in this.
>>> If we stop nodes with systemctl or just kill (TERM), it causes the
>>> problem,
>>> but if we kill -9, it doesn't cause the problem.
>>>
>>> Thanks,
>>> Hiro
>>>
>>> On Wed, Apr 24, 2019 at 11:31 PM Hiroyuki Yamada <mo...@gmail.com>
>>> wrote:
>>>
>>>> Sorry, I didn't write the version and the configurations.
>>>> I've tested with C* 3.11.4, and
>>>> the configurations are mostly set to default except for the replication
>>>> factor and listen_address for proper networking.
>>>>
>>>> Thanks,
>>>> Hiro
>>>>
>>>> On Wed, Apr 24, 2019 at 5:12 PM Hiroyuki Yamada <mo...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello Ben,
>>>>>
>>>>> Thank you for the quick reply.
>>>>> I haven't tried that case, but it does't recover even if I stopped the
>>>>> stress.
>>>>>
>>>>> Thanks,
>>>>> Hiro
>>>>>
>>>>> On Wed, Apr 24, 2019 at 3:36 PM Ben Slater <be...@instaclustr.com>
>>>>> wrote:
>>>>>
>>>>>> Is it possible that stress is overloading node 1 so it’s not
>>>>>> recovering state properly when node 2 comes up? Have you tried running with
>>>>>> a lower load (say 2 or 3 threads)?
>>>>>>
>>>>>> Cheers
>>>>>> Ben
>>>>>>
>>>>>> ---
>>>>>>
>>>>>>
>>>>>> *Ben Slater*
>>>>>> *Chief Product Officer*
>>>>>>
>>>>>>
>>>>>> <https://www.facebook.com/instaclustr>
>>>>>> <https://twitter.com/instaclustr>
>>>>>> <https://www.linkedin.com/company/instaclustr>
>>>>>>
>>>>>> Read our latest technical blog posts here
>>>>>> <https://www.instaclustr.com/blog/>.
>>>>>>
>>>>>> This email has been sent on behalf of Instaclustr Pty. Limited
>>>>>> (Australia) and Instaclustr Inc (USA).
>>>>>>
>>>>>> This email and any attachments may contain confidential and legally
>>>>>> privileged information.  If you are not the intended recipient, do not copy
>>>>>> or disclose its content, but please reply to this email immediately and
>>>>>> highlight the error to the sender and then immediately delete the message.
>>>>>>
>>>>>>
>>>>>> On Wed, 24 Apr 2019 at 16:28, Hiroyuki Yamada <mo...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I faced a weird issue when recovering a cluster after two nodes are
>>>>>>> stopped.
>>>>>>> It is easily reproduce-able and looks like a bug or an issue to fix,
>>>>>>> so let me write down the steps to reproduce.
>>>>>>>
>>>>>>> === STEPS TO REPRODUCE ===
>>>>>>> * Create a 3-node cluster with RF=3
>>>>>>>    - node1(seed), node2, node3
>>>>>>> * Start requests to the cluster with cassandra-stress (it continues
>>>>>>> until the end)
>>>>>>>    - what we did: cassandra-stress mixed cl=QUORUM duration=10m
>>>>>>> -errors ignore -node node1,node2,node3 -rate threads\>=16
>>>>>>> threads\<=256
>>>>>>> * Stop node3 normally (with systemctl stop)
>>>>>>>    - the system is still available because the quorum of nodes is
>>>>>>> still available
>>>>>>> * Stop node2 normally (with systemctl stop)
>>>>>>>    - the system is NOT available after it's stopped.
>>>>>>>    - the client gets `UnavailableException: Not enough replicas
>>>>>>> available for query at consistency QUORUM`
>>>>>>>    - the client gets errors right away (so few ms)
>>>>>>>    - so far it's all expected
>>>>>>> * Wait for 1 mins
>>>>>>> * Bring up node2
>>>>>>>    - The issue happens here.
>>>>>>>    - the client gets ReadTimeoutException` or WriteTimeoutException
>>>>>>> depending on if the request is read or write even after the node2 is
>>>>>>> up
>>>>>>>    - the client gets errors after about 5000ms or 2000ms, which are
>>>>>>> request timeout for write and read request
>>>>>>>    - what node1 reports with `nodetool status` and what node2 reports
>>>>>>> are not consistent. (node2 thinks node1 is down)
>>>>>>>    - It takes very long time to recover from its state
>>>>>>> === STEPS TO REPRODUCE ===
>>>>>>>
>>>>>>> Is it supposed to happen ?
>>>>>>> If we don't start cassandra-stress, it's all fine.
>>>>>>>
>>>>>>> Some workarounds we found to recover the state are the followings:
>>>>>>> * Restarting node1 and it recovers its state right after it's
>>>>>>> restarted
>>>>>>> * Setting lower value in dynamic_snitch_reset_interval_in_ms (to
>>>>>>> 60000
>>>>>>> or something)
>>>>>>>
>>>>>>> I don't think either of them is a really good solution.
>>>>>>> Can anyone explain what is going on and what is the best way to make
>>>>>>> it not happen or recover ?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Hiro
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>>>>>>> For additional commands, e-mail: user-help@cassandra.apache.org
>>>>>>>
>>>>>>>
>>

Re: A cluster (RF=3) not recovering after two nodes are stopped

Posted by Hiroyuki Yamada <mo...@gmail.com>.
Hi,

FYI: I created a bug ticket since I think the behavior is just not right.
https://issues.apache.org/jira/browse/CASSANDRA-15138

Thanks,
Hiro

On Mon, May 13, 2019 at 10:58 AM Hiroyuki Yamada <mo...@gmail.com> wrote:

> Hi,
>
> Should I post a bug ?
> It doesn't seem to be an expected behavior,
> so I think it should be at least documented somewhere.
>
> Thanks,
> Hiro
>
>
> On Fri, Apr 26, 2019 at 3:17 PM Hiroyuki Yamada <mo...@gmail.com>
> wrote:
>
>> Hello,
>>
>> Thank you for some feedbacks.
>>
>> >Ben
>> Thank you.
>> I've tested with lower concurrency in my side, the issue still occurs.
>> We are using 3 x T3.xlarge instances for C* and small and separate
>> instance for the client program.
>> But if we tried with 1 host with 3 C* nodes, the issue didn't occur.
>>
>> > Alok
>> We also thought so and tested with hints disabled, but it doesn't make
>> any difference. (the issue still occurs)
>>
>> Thanks,
>> Hiro
>>
>>
>>
>>
>> On Fri, Apr 26, 2019 at 8:19 AM Alok Dwivedi <
>> alok.dwivedi@instaclustr.com> wrote:
>>
>>> Could it be related to hinted hand offs being stored in Node1 and then
>>> attempted to be replayed in Node2 when it comes back causing more load as
>>> new mutations are also being applied from cassandra-stress at same time?
>>>
>>> Alok Dwivedi
>>> Senior Consultant
>>> https://www.instaclustr.com/
>>>
>>>
>>>
>>>
>>> On 26 Apr 2019, at 09:04, Ben Slater <be...@instaclustr.com> wrote:
>>>
>>> In the absence of anyone else having any bright ideas - it still sounds
>>> to me like the kind of scenario that can occur in a heavily overloaded
>>> cluster. I would try again with a lower load.
>>>
>>> What size machines are you using for stress client and the nodes? Are
>>> they all on separate machines?
>>>
>>> Cheers
>>> Ben
>>>
>>> ---
>>>
>>>
>>> *Ben Slater**Chief Product Officer*
>>>
>>> <https://www.instaclustr.com/platform/>
>>>
>>> <https://www.facebook.com/instaclustr>
>>> <https://twitter.com/instaclustr>
>>> <https://www.linkedin.com/company/instaclustr>
>>>
>>> Read our latest technical blog posts here
>>> <https://www.instaclustr.com/blog/>.
>>>
>>> This email has been sent on behalf of Instaclustr Pty. Limited
>>> (Australia) and Instaclustr Inc (USA).
>>>
>>> This email and any attachments may contain confidential and legally
>>> privileged information.  If you are not the intended recipient, do not copy
>>> or disclose its content, but please reply to this email immediately and
>>> highlight the error to the sender and then immediately delete the message.
>>>
>>>
>>> On Thu, 25 Apr 2019 at 17:26, Hiroyuki Yamada <mo...@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> Sorry again.
>>>> We found yet another weird thing in this.
>>>> If we stop nodes with systemctl or just kill (TERM), it causes the
>>>> problem,
>>>> but if we kill -9, it doesn't cause the problem.
>>>>
>>>> Thanks,
>>>> Hiro
>>>>
>>>> On Wed, Apr 24, 2019 at 11:31 PM Hiroyuki Yamada <mo...@gmail.com>
>>>> wrote:
>>>>
>>>>> Sorry, I didn't write the version and the configurations.
>>>>> I've tested with C* 3.11.4, and
>>>>> the configurations are mostly set to default except for the
>>>>> replication factor and listen_address for proper networking.
>>>>>
>>>>> Thanks,
>>>>> Hiro
>>>>>
>>>>> On Wed, Apr 24, 2019 at 5:12 PM Hiroyuki Yamada <mo...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello Ben,
>>>>>>
>>>>>> Thank you for the quick reply.
>>>>>> I haven't tried that case, but it does't recover even if I stopped
>>>>>> the stress.
>>>>>>
>>>>>> Thanks,
>>>>>> Hiro
>>>>>>
>>>>>> On Wed, Apr 24, 2019 at 3:36 PM Ben Slater <
>>>>>> ben.slater@instaclustr.com> wrote:
>>>>>>
>>>>>>> Is it possible that stress is overloading node 1 so it’s not
>>>>>>> recovering state properly when node 2 comes up? Have you tried running with
>>>>>>> a lower load (say 2 or 3 threads)?
>>>>>>>
>>>>>>> Cheers
>>>>>>> Ben
>>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>>
>>>>>>> *Ben Slater*
>>>>>>> *Chief Product Officer*
>>>>>>>
>>>>>>>
>>>>>>> <https://www.facebook.com/instaclustr>
>>>>>>> <https://twitter.com/instaclustr>
>>>>>>> <https://www.linkedin.com/company/instaclustr>
>>>>>>>
>>>>>>> Read our latest technical blog posts here
>>>>>>> <https://www.instaclustr.com/blog/>.
>>>>>>>
>>>>>>> This email has been sent on behalf of Instaclustr Pty. Limited
>>>>>>> (Australia) and Instaclustr Inc (USA).
>>>>>>>
>>>>>>> This email and any attachments may contain confidential and legally
>>>>>>> privileged information.  If you are not the intended recipient, do not copy
>>>>>>> or disclose its content, but please reply to this email immediately and
>>>>>>> highlight the error to the sender and then immediately delete the message.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, 24 Apr 2019 at 16:28, Hiroyuki Yamada <mo...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I faced a weird issue when recovering a cluster after two nodes are
>>>>>>>> stopped.
>>>>>>>> It is easily reproduce-able and looks like a bug or an issue to fix,
>>>>>>>> so let me write down the steps to reproduce.
>>>>>>>>
>>>>>>>> === STEPS TO REPRODUCE ===
>>>>>>>> * Create a 3-node cluster with RF=3
>>>>>>>>    - node1(seed), node2, node3
>>>>>>>> * Start requests to the cluster with cassandra-stress (it continues
>>>>>>>> until the end)
>>>>>>>>    - what we did: cassandra-stress mixed cl=QUORUM duration=10m
>>>>>>>> -errors ignore -node node1,node2,node3 -rate threads\>=16
>>>>>>>> threads\<=256
>>>>>>>> * Stop node3 normally (with systemctl stop)
>>>>>>>>    - the system is still available because the quorum of nodes is
>>>>>>>> still available
>>>>>>>> * Stop node2 normally (with systemctl stop)
>>>>>>>>    - the system is NOT available after it's stopped.
>>>>>>>>    - the client gets `UnavailableException: Not enough replicas
>>>>>>>> available for query at consistency QUORUM`
>>>>>>>>    - the client gets errors right away (so few ms)
>>>>>>>>    - so far it's all expected
>>>>>>>> * Wait for 1 mins
>>>>>>>> * Bring up node2
>>>>>>>>    - The issue happens here.
>>>>>>>>    - the client gets ReadTimeoutException` or WriteTimeoutException
>>>>>>>> depending on if the request is read or write even after the node2 is
>>>>>>>> up
>>>>>>>>    - the client gets errors after about 5000ms or 2000ms, which are
>>>>>>>> request timeout for write and read request
>>>>>>>>    - what node1 reports with `nodetool status` and what node2
>>>>>>>> reports
>>>>>>>> are not consistent. (node2 thinks node1 is down)
>>>>>>>>    - It takes very long time to recover from its state
>>>>>>>> === STEPS TO REPRODUCE ===
>>>>>>>>
>>>>>>>> Is it supposed to happen ?
>>>>>>>> If we don't start cassandra-stress, it's all fine.
>>>>>>>>
>>>>>>>> Some workarounds we found to recover the state are the followings:
>>>>>>>> * Restarting node1 and it recovers its state right after it's
>>>>>>>> restarted
>>>>>>>> * Setting lower value in dynamic_snitch_reset_interval_in_ms (to
>>>>>>>> 60000
>>>>>>>> or something)
>>>>>>>>
>>>>>>>> I don't think either of them is a really good solution.
>>>>>>>> Can anyone explain what is going on and what is the best way to make
>>>>>>>> it not happen or recover ?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Hiro
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>>>>>>>> For additional commands, e-mail: user-help@cassandra.apache.org
>>>>>>>>
>>>>>>>>
>>>