You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@druid.apache.org by Samarth Jain <sa...@gmail.com> on 2019/06/24 23:38:41 UTC

EBS backed data nodes

Hi,

Are there any users out there who have Druid data nodes (historical and
middle managers) that run on instances that have EBS backed disks. And what
kind of BalancerStrategy have they been using?

Since generally the cost of EBS is cheaper than adding a new instance, one
of the options that we are exploring internally is to scale up the cluster
by dynamically increasing the size of EBS instead of adding new instances.

Assuming a homogenous cluster of data nodes, and relatively equal
distribution of segments, one naive heuristic would be to increase the EBS
volume size by x% on each data node when the overall disk utilization of
the cluster goes beyond x%. Currently the config for maxSize of a server is
hardcoded in druid.server.maxSize. So we would need to make it dynamically
configurable. We would also need to make sure that new instances that come
up would be spun up with the updated maxSize.

Anyway, I would like to know if the community thinks this a bad idea in
general? Are there other ways of scaling up the cluster (assuming cluster
CPU utilization is low but disk utilization is high). Is tiering a better
option?

Thanks,
Samarth

Re: EBS backed data nodes

Posted by Samarth Jain <sa...@gmail.com>.
On the newer AWS instance types at least, we don't need to restart when
extending the ebs.

On Mon, Jun 24, 2019 at 11:43 PM Himanshu <g....@gmail.com> wrote:

> I would do what you said if cpu and memory, both were under utilized at
> their peak usage.
>
> > Currently the config for maxSize of a server is
> hardcoded in druid.server.maxSize. So we would need to make it dynamically
> configurable. We would also need to make sure that new instances that come
> up would be spun up with the updated maxSize.
>
> Yes, it can be adjusted but at the same time, "thing" that increase EBS
> volume size and restarts process, could also change druid.server.maxSize
> either in properties or in an env variable.
>
> -- Himanshu
>
> On Mon, Jun 24, 2019 at 4:39 PM Samarth Jain <sa...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Are there any users out there who have Druid data nodes (historical and
> > middle managers) that run on instances that have EBS backed disks. And
> what
> > kind of BalancerStrategy have they been using?
> >
> > Since generally the cost of EBS is cheaper than adding a new instance,
> one
> > of the options that we are exploring internally is to scale up the
> cluster
> > by dynamically increasing the size of EBS instead of adding new
> instances.
> >
> > Assuming a homogenous cluster of data nodes, and relatively equal
> > distribution of segments, one naive heuristic would be to increase the
> EBS
> > volume size by x% on each data node when the overall disk utilization of
> > the cluster goes beyond x%. Currently the config for maxSize of a server
> is
> > hardcoded in druid.server.maxSize. So we would need to make it
> dynamically
> > configurable. We would also need to make sure that new instances that
> come
> > up would be spun up with the updated maxSize.
> >
> > Anyway, I would like to know if the community thinks this a bad idea in
> > general? Are there other ways of scaling up the cluster (assuming cluster
> > CPU utilization is low but disk utilization is high). Is tiering a better
> > option?
> >
> > Thanks,
> > Samarth
> >
>

Re: EBS backed data nodes

Posted by Himanshu <g....@gmail.com>.
I would do what you said if cpu and memory, both were under utilized at
their peak usage.

> Currently the config for maxSize of a server is
hardcoded in druid.server.maxSize. So we would need to make it dynamically
configurable. We would also need to make sure that new instances that come
up would be spun up with the updated maxSize.

Yes, it can be adjusted but at the same time, "thing" that increase EBS
volume size and restarts process, could also change druid.server.maxSize
either in properties or in an env variable.

-- Himanshu

On Mon, Jun 24, 2019 at 4:39 PM Samarth Jain <sa...@gmail.com> wrote:

> Hi,
>
> Are there any users out there who have Druid data nodes (historical and
> middle managers) that run on instances that have EBS backed disks. And what
> kind of BalancerStrategy have they been using?
>
> Since generally the cost of EBS is cheaper than adding a new instance, one
> of the options that we are exploring internally is to scale up the cluster
> by dynamically increasing the size of EBS instead of adding new instances.
>
> Assuming a homogenous cluster of data nodes, and relatively equal
> distribution of segments, one naive heuristic would be to increase the EBS
> volume size by x% on each data node when the overall disk utilization of
> the cluster goes beyond x%. Currently the config for maxSize of a server is
> hardcoded in druid.server.maxSize. So we would need to make it dynamically
> configurable. We would also need to make sure that new instances that come
> up would be spun up with the updated maxSize.
>
> Anyway, I would like to know if the community thinks this a bad idea in
> general? Are there other ways of scaling up the cluster (assuming cluster
> CPU utilization is low but disk utilization is high). Is tiering a better
> option?
>
> Thanks,
> Samarth
>