You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Nandakishore Tokala <na...@gmail.com> on 2019/05/24 18:58:32 UTC

Ideal duration for max_hint_window_in_ms

HI All,

what is the impact of increasing the duration of the max_hint_window_in_ms,
as we are seeing nodes are going down and some times we are not bringing
them up in 3 hour's and during the repair, we are seeing a lot of streaming
data, due to the node is down.

so we are planning to increase the max_hint_window_in_ms time so that we
will less streaming during repair, so is there any drawback in increasing
the max_hint_window_in_ms?, and what is the ideal time for it(6 hrs, 12
hrs, 24 hrs)

Thanks
Nanda

Re: Ideal duration for max_hint_window_in_ms

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
Hello Nanda,

what is the impact of increasing the duration of the max_hint_window_in_ms
>

You might want to be aware of the relations between 'max_hint_window_in_ms
', 'gc_grace_seconds' and 'TTLs' to stay away from side effect and have the
desired impact only, my colleague Radovan wrote about this here:
http://thelastpickle.com/blog/2018/03/21/hinted-handoff-gc-grace-demystified.html
.

Other than that, I think the 3 hours were picked when hints were saved into
a system table. Previous design was often leading to hints being stuck,
especially when they were growing too big.
Yet on C*3+, hints are now stored as files. I did not hear about many
issues nowadays. I did not hear about people trying to increase this value
either.

Thus I guess that if you handle the hinted handoff smoothly not to harm
cluster when the node goes back up, you consider side effects of changing
this value as mentioned by Radovan and you are using Cassandra 3+, I guess
you could give it a try. Also keep in mind that hints are an optimization
(as it can be disabled). There is no guarantees delivery for hints (or at
least it was the case before C*3). This (alone) will not 'allow you' to
disable repairs safely.

Now I don't have experience with storing hints longer since they are stored
in files. If you do, you should probably try it in some test cluster first.
But I'd be happy to hear about your experience with it.

---------------------------

Also, I think it might be more important to investigate why nodes are going
down and fix this instead/first. More hints might mean more pressure on the
nodes and you might have counter-productive impacts by increasing the hints
storage time.

A couple of random commands to investigate why nodes are going down, maybe
these commands I often use might be of some help to you:

- grep -e "WARN" -e "ERROR" /var/log/cassandra/system.log # Anything in the
output there is probably worth your attention. If nodes go down something
should appear here.
- watch -d nodetool tpstats # Here you might use this on worst node at the
worst time to see if any threads are stacking in the 'pending' state. Also
check for 'blocked' and 'dropped'

If you'd like some help with your 'main issue' first, we would need more
details and context.

Hope that any of this is of some help :).

C*heers,
-----------------------
Alain Rodriguez - alain@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com


Le ven. 24 mai 2019 à 19:59, Nandakishore Tokala <
nandakishore.tokala@gmail.com> a écrit :

> HI All,
>
> what is the impact of increasing the duration of
> the max_hint_window_in_ms, as we are seeing nodes are going down and some
> times we are not bringing them up in 3 hour's and during the repair, we are
> seeing a lot of streaming data, due to the node is down.
>
> so we are planning to increase the max_hint_window_in_ms time so that we
> will less streaming during repair, so is there any drawback in increasing
> the max_hint_window_in_ms?, and what is the ideal time for it(6 hrs, 12
> hrs, 24 hrs)
>
> Thanks
> Nanda
>