You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@druid.apache.org by li...@gmail.com, li...@gmail.com on 2019/02/26 19:07:08 UTC

Druid metadata query

Hi, I noticed that the query node will send a metadata query to the historical node once new segments are published, and this will trigger an analysis method to read all segments in memory and do the analysis. I tried to do lazy cache for columns, which will read segments from disk once they are called, however, the analysis method will read all data in memory to process the metadata query. Is it a good idea to migrate this analysis process to the middlemanager and persist the result in deep storage? So the historical node can just read that file to answer metadata query.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@druid.apache.org
For additional commands, e-mail: dev-help@druid.apache.org


Re: Druid metadata query

Posted by Leo <li...@gmail.com>.

On 2019/02/26 23:11:22, Leo <li...@gmail.com> wrote: 
> Hi, I am sorry I didn't make it clear, I tried to do lazy loading for the segment data, not metadata. I hope some columns of a segment will not be loaded in historical nodes until they are called. But SegmentMetadata needs to be produced when a new segment just went to the historical node, and this process needs to read segments in and do the analysis. So I hope to get metadata without reading any segments in.
> 
> On 2019/02/26 22:09:24, Gian Merlino <gi...@apache.org> wrote: 
> > Hmm. I think you're talking about the SegmentMetadata queries that
> > DruidSchema runs. The intent is that they include an empty analysisTypes
> > list, so they only use cached metadata and don't actually read segments,
> > and are pretty resource-light on historicals. But if you implemented some
> > sort of lazy loading for that metadata, those wouldn't play well together.
> > I'm not sure what the best approach is here. What's the purpose of the lazy
> > loading? If we need to make them play better together, one way to do that
> > could be to add the information that the broker needs to the segment-level
> > "Metadata" object, which I think is probably going to be faster to load,
> > and then keep loading that eagerly.
> > 
> > On Tue, Feb 26, 2019 at 11:11 AM liliuleo93@gmail.com <li...@gmail.com>
> > wrote:
> > 
> > > Hi, I noticed that the query node will send a metadata query to the
> > > historical node once new segments are published, and this will trigger an
> > > analysis method to read all segments in memory and do the analysis. I tried
> > > to do lazy cache for columns, which will read segments from disk once they
> > > are called, however, the analysis method will read all data in memory to
> > > process the metadata query. Is it a good idea to migrate this analysis
> > > process to the middlemanager and persist the result in deep storage? So the
> > > historical node can just read that file to answer metadata query.
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@druid.apache.org
> > > For additional commands, e-mail: dev-help@druid.apache.org
> > >
> > >
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@druid.apache.org
> For additional commands, e-mail: dev-help@druid.apache.org
> 
> Hi, I noticed that the query node will send a metadata query to the historical node once new segments are published, and this will trigger an analysis method to read all segments in memory and do the analysis. I tried to do lazy cache for columns, which will read segments from disk once they are called, however, the analysis method will read all data in memory to process the metadata query. Is it a good idea to migrate this analysis process to the middlemanager and persist the result in deep storage? So the historical node can just read that file to answer metadata query.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@druid.apache.org
For additional commands, e-mail: dev-help@druid.apache.org


Re: Druid metadata query

Posted by Leo <li...@gmail.com>.
Hi, I am sorry I didn't make it clear, I tried to do lazy loading for the segment data, not metadata. I hope some columns of a segment will not be loaded in historical nodes until they are called. But SegmentMetadata needs to be produced when a new segment just went to the historical node, and this process needs to read segments in and do the analysis. So I hope to get metadata without reading any segments in.

On 2019/02/26 22:09:24, Gian Merlino <gi...@apache.org> wrote: 
> Hmm. I think you're talking about the SegmentMetadata queries that
> DruidSchema runs. The intent is that they include an empty analysisTypes
> list, so they only use cached metadata and don't actually read segments,
> and are pretty resource-light on historicals. But if you implemented some
> sort of lazy loading for that metadata, those wouldn't play well together.
> I'm not sure what the best approach is here. What's the purpose of the lazy
> loading? If we need to make them play better together, one way to do that
> could be to add the information that the broker needs to the segment-level
> "Metadata" object, which I think is probably going to be faster to load,
> and then keep loading that eagerly.
> 
> On Tue, Feb 26, 2019 at 11:11 AM liliuleo93@gmail.com <li...@gmail.com>
> wrote:
> 
> > Hi, I noticed that the query node will send a metadata query to the
> > historical node once new segments are published, and this will trigger an
> > analysis method to read all segments in memory and do the analysis. I tried
> > to do lazy cache for columns, which will read segments from disk once they
> > are called, however, the analysis method will read all data in memory to
> > process the metadata query. Is it a good idea to migrate this analysis
> > process to the middlemanager and persist the result in deep storage? So the
> > historical node can just read that file to answer metadata query.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@druid.apache.org
> > For additional commands, e-mail: dev-help@druid.apache.org
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@druid.apache.org
For additional commands, e-mail: dev-help@druid.apache.org


Re: Druid metadata query

Posted by Gian Merlino <gi...@apache.org>.
Hmm. I think you're talking about the SegmentMetadata queries that
DruidSchema runs. The intent is that they include an empty analysisTypes
list, so they only use cached metadata and don't actually read segments,
and are pretty resource-light on historicals. But if you implemented some
sort of lazy loading for that metadata, those wouldn't play well together.
I'm not sure what the best approach is here. What's the purpose of the lazy
loading? If we need to make them play better together, one way to do that
could be to add the information that the broker needs to the segment-level
"Metadata" object, which I think is probably going to be faster to load,
and then keep loading that eagerly.

On Tue, Feb 26, 2019 at 11:11 AM liliuleo93@gmail.com <li...@gmail.com>
wrote:

> Hi, I noticed that the query node will send a metadata query to the
> historical node once new segments are published, and this will trigger an
> analysis method to read all segments in memory and do the analysis. I tried
> to do lazy cache for columns, which will read segments from disk once they
> are called, however, the analysis method will read all data in memory to
> process the metadata query. Is it a good idea to migrate this analysis
> process to the middlemanager and persist the result in deep storage? So the
> historical node can just read that file to answer metadata query.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@druid.apache.org
> For additional commands, e-mail: dev-help@druid.apache.org
>
>