You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@druid.apache.org by Samarth Jain <sa...@gmail.com> on 2019/02/19 00:26:13 UTC

Re: Slow download of segments from deep storage

I have created an issue
https://github.com/apache/incubator-druid/issues/7068 that outlines the
limitations in current approach which prevents us from parallelizing the
segment load/drop workload. Also have raised a PR
https://github.com/apache/incubator-druid/pull/7088 to help address it.

On Wed, Jan 30, 2019 at 4:40 PM Gian Merlino <gi...@apache.org> wrote:

> I believe today, if you use the (experimental) HTTP-based load queues, they
> will parallelize segment downloads. Adding similar functionality for the
> ZK-based load queues would definitely be useful though, since at this time
> nobody seems to be actively driving a migration to HTTP-based load queues
> being enabled by default.
>
> On Wed, Jan 30, 2019 at 7:20 PM Samarth Jain <sa...@apache.org> wrote:
>
> > We noticed that it takes a long time for the historicals to download
> > segments from deep storage (in our case S3). Looking closer at the code
> in
> > ZKCoordinator, I noticed that the segment download is happening in a
> single
> > threaded fashion. This download happens in the SingleThreadedExecutor
> > service used by the PathChildrenCache. Looking at the commentary on
> > https://github.com/apache/incubator-druid/issues/4421 and
> > https://github.com/apache/incubator-druid/issues/3202, the executor
> > service
> > used in PathChildrenCache can only be single threaded.
> >
> > My proposal is to use a multi threaded ExecutorService that will be used
> to
> > take action on the  events to perform the download. The role of single
> > threaded ExecutorService in PathChildrenCache will be simply to delegate
> > the download task to this new executor service.
> >
> > Does that sound feasible? IMO, if this happens to be functionally
> correct,
> > it should help significantly boost up the time it is taking historicals
> to
> > download all the assigned segments.
> >
> > I would be more than happy to contribute this enhancement to the
> community.
> >
> > Thanks,
> > Samarth
> >
>

Re: Slow download of segments from deep storage

Posted by Gian Merlino <gi...@apache.org>.
Hey Samarth,

I wrote a comment in the issue - thanks for looking at this, it is a
valuable issue IMO.

On Mon, Feb 18, 2019 at 4:26 PM Samarth Jain <sa...@gmail.com> wrote:

> I have created an issue
> https://github.com/apache/incubator-druid/issues/7068 that outlines the
> limitations in current approach which prevents us from parallelizing the
> segment load/drop workload. Also have raised a PR
> https://github.com/apache/incubator-druid/pull/7088 to help address it.
>
> On Wed, Jan 30, 2019 at 4:40 PM Gian Merlino <gi...@apache.org> wrote:
>
> > I believe today, if you use the (experimental) HTTP-based load queues,
> they
> > will parallelize segment downloads. Adding similar functionality for the
> > ZK-based load queues would definitely be useful though, since at this
> time
> > nobody seems to be actively driving a migration to HTTP-based load queues
> > being enabled by default.
> >
> > On Wed, Jan 30, 2019 at 7:20 PM Samarth Jain <sa...@apache.org> wrote:
> >
> > > We noticed that it takes a long time for the historicals to download
> > > segments from deep storage (in our case S3). Looking closer at the code
> > in
> > > ZKCoordinator, I noticed that the segment download is happening in a
> > single
> > > threaded fashion. This download happens in the SingleThreadedExecutor
> > > service used by the PathChildrenCache. Looking at the commentary on
> > > https://github.com/apache/incubator-druid/issues/4421 and
> > > https://github.com/apache/incubator-druid/issues/3202, the executor
> > > service
> > > used in PathChildrenCache can only be single threaded.
> > >
> > > My proposal is to use a multi threaded ExecutorService that will be
> used
> > to
> > > take action on the  events to perform the download. The role of single
> > > threaded ExecutorService in PathChildrenCache will be simply to
> delegate
> > > the download task to this new executor service.
> > >
> > > Does that sound feasible? IMO, if this happens to be functionally
> > correct,
> > > it should help significantly boost up the time it is taking historicals
> > to
> > > download all the assigned segments.
> > >
> > > I would be more than happy to contribute this enhancement to the
> > community.
> > >
> > > Thanks,
> > > Samarth
> > >
> >
>