You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@druid.apache.org by Will Lauer <wl...@verizonmedia.com.INVALID> on 2021/08/09 21:29:48 UTC

Question about merging groupby v2 spill files

I recently submitted an issue about "Too many open files" in GroupBy v2 (
https://github.com/apache/druid/issues/11558) and have been investigating a
solution. It looked like the problem was happening because the code
preemptively opened all the spill files for reading, which when there are a
huge number of spill files (in our case, a single query is generating 110k
spill files), causes the "too many open files" error when the files ulimit
is set to an otherwise reasonable number. We can work around this for now
by setting "ulimit -n" to a huge value (like 1 million), but I was hoping
for a better solution.

In https://github.com/apache/druid/pull/11559, I attempted to fix this by
lazily opening files only when they were ready to be read and closing them
immediately after they had finished being read. While this looks like it
fixes the issue in some small edge cases, it isn't a general solution
because many queries end up calling CloseableIterators.mergeSorted() to
merge all the spill files together, which due to sorting necessitates
reading all the files at once, causing the "too many files" error again. It
looks like mergeSorted() is called because frequently the grouping code is
assuming the results should be sorted and is calling
ConcurrentGrouper.parallelSortAndGetGroupersIterator().

My question is, can anyone think of a way to avoid the need for sorting at
this level so as to avoid the need for opening all the spill files. Given
how sketches work in druid right now, I don't see an easy way to reduce the
number of spill files we are seeing, so I was hoping to address this on the
grouper side, but right now I can't see a solution that makes this any
better. We aren't blocked, because we can set the maximum number of files
to a much larger number, but that is an unpalatable long term solution.

Will


<http://www.verizonmedia.com>

Will Lauer

Senior Principal Architect, Audience & Advertising Reporting
Data Platforms & Systems Engineering

M 508 561 6427
1908 S. First St
Champaign, IL 61822

<http://www.facebook.com/verizonmedia>   <http://twitter.com/verizonmedia>
<https://www.linkedin.com/company/verizon-media/>
<http://www.instagram.com/verizonmedia>

Re: [E] Re: Question about merging groupby v2 spill files

Posted by Gian Merlino <gi...@apache.org>.
> We found (after consulting with the datasketches team) that the
> overhead trying to do hierarchical merging with sketches outweighed
> the performance benefit.

It makes sense that this is a risk. The good news is that the groupBy
merging is less resource intensive on a per-file basis, so you should be
able to have a lot of files open at once. If N-way hierarchical merging of
groupBy spill files existed, and N was set to, say, 10000, then it
shouldn't slow things down too badly.

On Tue, Aug 10, 2021 at 3:32 PM Will Lauer <wl...@verizonmedia.com.invalid>
wrote:

> Interesting. That's worth looking into.
>
> BTW, when we updated to whatever version introduced the hierarchical
> merging, we found that this actually has poor performance with sketches and
> had to disable the hierarchical merging. We found (after consulting with
> the datasketches team) that the overhead trying to do hierarchical merging
> with sketches outweighed the performance benefit. Sketches have a large
> finalization cost, and having to pay that at each level of the hierarchy
> swamped the gains from parallelizing the process.
>
> Given that we just get failures here, it might be worth doing hierarchical
> merging, as slow vs broken is a better tradeoff than slow vs fast.
>
> Will
>
>
> <http://www.verizonmedia.com>
>
> Will Lauer
>
> Senior Principal Architect, Audience & Advertising Reporting
> Data Platforms & Systems Engineering
>
> M 508 561 6427
> 1908 S. First St
> Champaign, IL 61822
>
> <http://www.facebook.com/verizonmedia>   <http://twitter.com/verizonmedia>
> <https://www.linkedin.com/company/verizon-media/>
> <http://www.instagram.com/verizonmedia>
>
>
>
> On Tue, Aug 10, 2021 at 12:29 PM Gian Merlino <gi...@apache.org> wrote:
>
> > Hey Will,
> >
> > The sorting that happens on the data servers is really useful, because it
> > means the Broker can do its part of the query fully streaming instead of
> > buffering things up.
> >
> > At one point we had a similar problem in ingestion (you could have a ton
> of
> > spill files if you had a lot of sketches) and ended up addressing that by
> > doing the merge hierarchically. Something similar should work in the
> > SpillingGrouper. Instead of opening everything all at once, we could open
> > things in chunks of N, merge those, and then proceed hierarchically. The
> > merge tree would have logarithmic height.
> >
> > The ingestion version of this was:
> >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_druid_pull_10689&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=fOgaA482CvqwfYURHMAIkb0INK1oswSrfJSdXlqXSCA&s=ZJ8LwhSGqznNndW6sDKV97NjCLYxyw-Z_bFTQ7R_VPM&e=
> >
> > On Mon, Aug 9, 2021 at 2:30 PM Will Lauer <wlauer@verizonmedia.com
> > .invalid>
> > wrote:
> >
> > > I recently submitted an issue about "Too many open files" in GroupBy
> v2 (
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_druid_issues_11558&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=fOgaA482CvqwfYURHMAIkb0INK1oswSrfJSdXlqXSCA&s=bbH6GMSCU7z1AnJ_UoJjEqn1Ep0xJ_kOxKIB8CCnr3o&e=
> > ) and have been investigating
> > > a
> > > solution. It looked like the problem was happening because the code
> > > preemptively opened all the spill files for reading, which when there
> > are a
> > > huge number of spill files (in our case, a single query is generating
> > 110k
> > > spill files), causes the "too many open files" error when the files
> > ulimit
> > > is set to an otherwise reasonable number. We can work around this for
> now
> > > by setting "ulimit -n" to a huge value (like 1 million), but I was
> hoping
> > > for a better solution.
> > >
> > > In
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_druid_pull_11559&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=fOgaA482CvqwfYURHMAIkb0INK1oswSrfJSdXlqXSCA&s=K6-9nBzX2slHBC7KyzDHEhGfRgI4uIMfNlbUMF1IRh8&e=
> > , I attempted to fix this by
> > > lazily opening files only when they were ready to be read and closing
> > them
> > > immediately after they had finished being read. While this looks like
> it
> > > fixes the issue in some small edge cases, it isn't a general solution
> > > because many queries end up calling CloseableIterators.mergeSorted() to
> > > merge all the spill files together, which due to sorting necessitates
> > > reading all the files at once, causing the "too many files" error
> again.
> > It
> > > looks like mergeSorted() is called because frequently the grouping code
> > is
> > > assuming the results should be sorted and is calling
> > > ConcurrentGrouper.parallelSortAndGetGroupersIterator().
> > >
> > > My question is, can anyone think of a way to avoid the need for sorting
> > at
> > > this level so as to avoid the need for opening all the spill files.
> Given
> > > how sketches work in druid right now, I don't see an easy way to reduce
> > the
> > > number of spill files we are seeing, so I was hoping to address this on
> > the
> > > grouper side, but right now I can't see a solution that makes this any
> > > better. We aren't blocked, because we can set the maximum number of
> files
> > > to a much larger number, but that is an unpalatable long term solution.
> > >
> > > Will
> > >
> > >
> > > <http://www.verizonmedia.com>
> > >
> > > Will Lauer
> > >
> > > Senior Principal Architect, Audience & Advertising Reporting
> > > Data Platforms & Systems Engineering
> > >
> > > M 508 561 6427
> > > 1908 S. First St
> > > Champaign, IL 61822
> > >
> > > <
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.facebook.com_verizonmedia&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=fOgaA482CvqwfYURHMAIkb0INK1oswSrfJSdXlqXSCA&s=bBuBrJqXrK8CF93z_q6ABvtHUEDjtXuRV34rEFfMAS4&e=
> > >   <
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_verizonmedia&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=fOgaA482CvqwfYURHMAIkb0INK1oswSrfJSdXlqXSCA&s=bmqp2pN5AHwhPVzOY0mjfCvqqy6KFneY7dixrJ29BL8&e=
> > >
> > > <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_verizon-2Dmedia_&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=fOgaA482CvqwfYURHMAIkb0INK1oswSrfJSdXlqXSCA&s=8wnli45Segm6Fsd-xVcOj_GutNnuZmHyf2NOOoF8Zv8&e=
> > >
> > > <
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.instagram.com_verizonmedia&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=fOgaA482CvqwfYURHMAIkb0INK1oswSrfJSdXlqXSCA&s=VWIesUn6Op6tEVcT9slFyiFCu_p_6yjrspPlpUpisZg&e=
> > >
> > >
> >
>

Re: [E] Re: Question about merging groupby v2 spill files

Posted by Will Lauer <wl...@verizonmedia.com.INVALID>.
Interesting. That's worth looking into.

BTW, when we updated to whatever version introduced the hierarchical
merging, we found that this actually has poor performance with sketches and
had to disable the hierarchical merging. We found (after consulting with
the datasketches team) that the overhead trying to do hierarchical merging
with sketches outweighed the performance benefit. Sketches have a large
finalization cost, and having to pay that at each level of the hierarchy
swamped the gains from parallelizing the process.

Given that we just get failures here, it might be worth doing hierarchical
merging, as slow vs broken is a better tradeoff than slow vs fast.

Will


<http://www.verizonmedia.com>

Will Lauer

Senior Principal Architect, Audience & Advertising Reporting
Data Platforms & Systems Engineering

M 508 561 6427
1908 S. First St
Champaign, IL 61822

<http://www.facebook.com/verizonmedia>   <http://twitter.com/verizonmedia>
<https://www.linkedin.com/company/verizon-media/>
<http://www.instagram.com/verizonmedia>



On Tue, Aug 10, 2021 at 12:29 PM Gian Merlino <gi...@apache.org> wrote:

> Hey Will,
>
> The sorting that happens on the data servers is really useful, because it
> means the Broker can do its part of the query fully streaming instead of
> buffering things up.
>
> At one point we had a similar problem in ingestion (you could have a ton of
> spill files if you had a lot of sketches) and ended up addressing that by
> doing the merge hierarchically. Something similar should work in the
> SpillingGrouper. Instead of opening everything all at once, we could open
> things in chunks of N, merge those, and then proceed hierarchically. The
> merge tree would have logarithmic height.
>
> The ingestion version of this was:
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_druid_pull_10689&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=fOgaA482CvqwfYURHMAIkb0INK1oswSrfJSdXlqXSCA&s=ZJ8LwhSGqznNndW6sDKV97NjCLYxyw-Z_bFTQ7R_VPM&e=
>
> On Mon, Aug 9, 2021 at 2:30 PM Will Lauer <wlauer@verizonmedia.com
> .invalid>
> wrote:
>
> > I recently submitted an issue about "Too many open files" in GroupBy v2 (
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_druid_issues_11558&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=fOgaA482CvqwfYURHMAIkb0INK1oswSrfJSdXlqXSCA&s=bbH6GMSCU7z1AnJ_UoJjEqn1Ep0xJ_kOxKIB8CCnr3o&e=
> ) and have been investigating
> > a
> > solution. It looked like the problem was happening because the code
> > preemptively opened all the spill files for reading, which when there
> are a
> > huge number of spill files (in our case, a single query is generating
> 110k
> > spill files), causes the "too many open files" error when the files
> ulimit
> > is set to an otherwise reasonable number. We can work around this for now
> > by setting "ulimit -n" to a huge value (like 1 million), but I was hoping
> > for a better solution.
> >
> > In
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_druid_pull_11559&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=fOgaA482CvqwfYURHMAIkb0INK1oswSrfJSdXlqXSCA&s=K6-9nBzX2slHBC7KyzDHEhGfRgI4uIMfNlbUMF1IRh8&e=
> , I attempted to fix this by
> > lazily opening files only when they were ready to be read and closing
> them
> > immediately after they had finished being read. While this looks like it
> > fixes the issue in some small edge cases, it isn't a general solution
> > because many queries end up calling CloseableIterators.mergeSorted() to
> > merge all the spill files together, which due to sorting necessitates
> > reading all the files at once, causing the "too many files" error again.
> It
> > looks like mergeSorted() is called because frequently the grouping code
> is
> > assuming the results should be sorted and is calling
> > ConcurrentGrouper.parallelSortAndGetGroupersIterator().
> >
> > My question is, can anyone think of a way to avoid the need for sorting
> at
> > this level so as to avoid the need for opening all the spill files. Given
> > how sketches work in druid right now, I don't see an easy way to reduce
> the
> > number of spill files we are seeing, so I was hoping to address this on
> the
> > grouper side, but right now I can't see a solution that makes this any
> > better. We aren't blocked, because we can set the maximum number of files
> > to a much larger number, but that is an unpalatable long term solution.
> >
> > Will
> >
> >
> > <http://www.verizonmedia.com>
> >
> > Will Lauer
> >
> > Senior Principal Architect, Audience & Advertising Reporting
> > Data Platforms & Systems Engineering
> >
> > M 508 561 6427
> > 1908 S. First St
> > Champaign, IL 61822
> >
> > <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.facebook.com_verizonmedia&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=fOgaA482CvqwfYURHMAIkb0INK1oswSrfJSdXlqXSCA&s=bBuBrJqXrK8CF93z_q6ABvtHUEDjtXuRV34rEFfMAS4&e=
> >   <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__twitter.com_verizonmedia&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=fOgaA482CvqwfYURHMAIkb0INK1oswSrfJSdXlqXSCA&s=bmqp2pN5AHwhPVzOY0mjfCvqqy6KFneY7dixrJ29BL8&e=
> >
> > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_verizon-2Dmedia_&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=fOgaA482CvqwfYURHMAIkb0INK1oswSrfJSdXlqXSCA&s=8wnli45Segm6Fsd-xVcOj_GutNnuZmHyf2NOOoF8Zv8&e=
> >
> > <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.instagram.com_verizonmedia&d=DwIBaQ&c=sWW_bEwW_mLyN3Kx2v57Q8e-CRbmiT9yOhqES_g_wVY&r=vGHo2vqhE2ZeS_hHdb4Y3eoJ4WjVKhEg5Xld1w9ptEQ&m=fOgaA482CvqwfYURHMAIkb0INK1oswSrfJSdXlqXSCA&s=VWIesUn6Op6tEVcT9slFyiFCu_p_6yjrspPlpUpisZg&e=
> >
> >
>

Re: Question about merging groupby v2 spill files

Posted by Gian Merlino <gi...@apache.org>.
Hey Will,

The sorting that happens on the data servers is really useful, because it
means the Broker can do its part of the query fully streaming instead of
buffering things up.

At one point we had a similar problem in ingestion (you could have a ton of
spill files if you had a lot of sketches) and ended up addressing that by
doing the merge hierarchically. Something similar should work in the
SpillingGrouper. Instead of opening everything all at once, we could open
things in chunks of N, merge those, and then proceed hierarchically. The
merge tree would have logarithmic height.

The ingestion version of this was:
https://github.com/apache/druid/pull/10689

On Mon, Aug 9, 2021 at 2:30 PM Will Lauer <wl...@verizonmedia.com.invalid>
wrote:

> I recently submitted an issue about "Too many open files" in GroupBy v2 (
> https://github.com/apache/druid/issues/11558) and have been investigating
> a
> solution. It looked like the problem was happening because the code
> preemptively opened all the spill files for reading, which when there are a
> huge number of spill files (in our case, a single query is generating 110k
> spill files), causes the "too many open files" error when the files ulimit
> is set to an otherwise reasonable number. We can work around this for now
> by setting "ulimit -n" to a huge value (like 1 million), but I was hoping
> for a better solution.
>
> In https://github.com/apache/druid/pull/11559, I attempted to fix this by
> lazily opening files only when they were ready to be read and closing them
> immediately after they had finished being read. While this looks like it
> fixes the issue in some small edge cases, it isn't a general solution
> because many queries end up calling CloseableIterators.mergeSorted() to
> merge all the spill files together, which due to sorting necessitates
> reading all the files at once, causing the "too many files" error again. It
> looks like mergeSorted() is called because frequently the grouping code is
> assuming the results should be sorted and is calling
> ConcurrentGrouper.parallelSortAndGetGroupersIterator().
>
> My question is, can anyone think of a way to avoid the need for sorting at
> this level so as to avoid the need for opening all the spill files. Given
> how sketches work in druid right now, I don't see an easy way to reduce the
> number of spill files we are seeing, so I was hoping to address this on the
> grouper side, but right now I can't see a solution that makes this any
> better. We aren't blocked, because we can set the maximum number of files
> to a much larger number, but that is an unpalatable long term solution.
>
> Will
>
>
> <http://www.verizonmedia.com>
>
> Will Lauer
>
> Senior Principal Architect, Audience & Advertising Reporting
> Data Platforms & Systems Engineering
>
> M 508 561 6427
> 1908 S. First St
> Champaign, IL 61822
>
> <http://www.facebook.com/verizonmedia>   <http://twitter.com/verizonmedia>
> <https://www.linkedin.com/company/verizon-media/>
> <http://www.instagram.com/verizonmedia>
>