You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@impala.apache.org by Todd Lipcon <to...@cloudera.com> on 2018/08/21 16:41:19 UTC

Improving latency of catalog update propagation?

Hey folks,

In my recent forays into the catalog->statestore->impalad metadata
propagation code base, I noticed that the latency of any update is
typically between 2-4 seconds with the standard 2-second statestore polling
interval. That's because the code currently works as follows:

1. in the steady state with no recent metadata changes, the catalogd's
state is:
-- topic_updates_ready_ = true
-- pending_topic_updates_ = empty

2. some metadata change happens, which modifies the version numbers in the
Java catalog but doesn't modify any of the C++ side state

3. the next statestore poll happens due to the normal interval expiring. On
average, this will take *1/2 the polling interval*
-- this sees that pending_topic_updates_ is empty, so returns no results.
-- it sets topic_updates_ready_ = false and triggers the "gather" thread

4. the "gather" thread wakes up and gathers updates, filling in
'pending_topic_updates_' and setting 'topic_updates_ready_' back to true
(typically subsecond in smallish catalogs, so this happens before the next
poll)

5. wait *another full statestore polling interval* (2 seconds) after step
#3 above, at which point we deliver the metadata update to the statestore

6. wait on average* 1/2 the polling interval* until any particular impalad
gets the update from #4

So. in the absolute best case, we wait one full polling interval (2
seconds), and in the worst case we wait two polling intervals (4 seconds).

Has anyone looked into optimizing this at all? It seems like we could have
metadata changes trigger an immediate "collection" into the C++ side, and
have the statestore update callback wait ("long poll" style) for an update
rather than skip if there is nothing available.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Improving latency of catalog update propagation?

Posted by Tim Armstrong <ta...@cloudera.com>.

Yeah I think the angle makes sense to pursue. I don't feel strongly about
whether now or later is the right time to pursue it, but it does seem like
it's not the immediate highest priority.

On Tue, Aug 21, 2018 at 1:57 PM, Tianyi Wang <tw...@cloudera.com> wrote:

> GetCatalogDelta used to block catalogd from executing DDLs and the pending
> struct was yet another cache to smooth things a little.
>
> On Tue, Aug 21, 2018 at 11:28 AM Todd Lipcon <to...@cloudera.com> wrote:
>
> > One more parting thought: why don't we just call 'GetCatalogDelta()'
> > directly from the catalog callback in order to do a direct handoff,
> instead
> > of storing them in this 'pending' struct? Given the statestore uses a
> > dedicated thread per subscriber (right?) it seems like it would be fine
> for
> > the update callback to take a long time, no?
> >
> > -Todd
> >
> > On Tue, Aug 21, 2018 at 11:09 AM, Todd Lipcon <to...@cloudera.com> wrote:
> >
> > > Thanks, Tim. I'm guessing once we switch over these RPCs to KRPC
> instead
> > > of Thrift we'll alleviate some of the scalability issues and maybe we
> can
> > > look into increasing frequency or doing a "push" to the statestore,
> etc.
> > I
> > > probably won't work on this in the near term to avoid complicating the
> > > ongoing changes with catalog.
> > >
> > > -Todd
> > >
> > > On Tue, Aug 21, 2018 at 10:22 AM, Tim Armstrong <
> tarmstrong@cloudera.com
> > >
> > > wrote:
> > >
> > >> This is somewhat relevant for admission control too - I had thought
> > about
> > >> some of these issues in that context, because reducing the latency of
> > >> admission controls state propagation helps avoid overadmission but
> > having
> > >> a
> > >> very low statestore frequency is very inefficient and doesn't scale
> well
> > >> to
> > >> larger clusters.
> > >>
> > >> For the catalog updates I agree we could do something with long polls
> > >> since
> > >> it's a single producer so that the "idle" state of the system has a
> > thread
> > >> sitting in the update callback on catalogd waiting for an update.
> > >>
> > >> I'd also thought at one point about allowing subscribers to notify the
> > >> statestore that they had something to add to the topic. That could be
> > >> treated as a hint to the statestore to schedule the subscriber update
> > >> sooner. This would also work for admission control since coordinators
> > >> could
> > >> notify the statestore when the first query was admitted after the
> > previous
> > >> statestore update.
> > >>
> > >> On Tue, Aug 21, 2018 at 9:41 AM, Todd Lipcon <to...@cloudera.com>
> wrote:
> > >>
> > >> > Hey folks,
> > >> >
> > >> > In my recent forays into the catalog->statestore->impalad metadata
> > >> > propagation code base, I noticed that the latency of any update is
> > >> > typically between 2-4 seconds with the standard 2-second statestore
> > >> polling
> > >> > interval. That's because the code currently works as follows:
> > >> >
> > >> > 1. in the steady state with no recent metadata changes, the
> catalogd's
> > >> > state is:
> > >> > -- topic_updates_ready_ = true
> > >> > -- pending_topic_updates_ = empty
> > >> >
> > >> > 2. some metadata change happens, which modifies the version numbers
> in
> > >> the
> > >> > Java catalog but doesn't modify any of the C++ side state
> > >> >
> > >> > 3. the next statestore poll happens due to the normal interval
> > >> expiring. On
> > >> > average, this will take *1/2 the polling interval*
> > >> > -- this sees that pending_topic_updates_ is empty, so returns no
> > >> results.
> > >> > -- it sets topic_updates_ready_ = false and triggers the "gather"
> > thread
> > >> >
> > >> > 4. the "gather" thread wakes up and gathers updates, filling in
> > >> > 'pending_topic_updates_' and setting 'topic_updates_ready_' back to
> > true
> > >> > (typically subsecond in smallish catalogs, so this happens before
> the
> > >> next
> > >> > poll)
> > >> >
> > >> > 5. wait *another full statestore polling interval* (2 seconds) after
> > >> step
> > >> > #3 above, at which point we deliver the metadata update to the
> > >> statestore
> > >> >
> > >> > 6. wait on average* 1/2 the polling interval* until any particular
> > >> impalad
> > >> > gets the update from #4
> > >> >
> > >> > So. in the absolute best case, we wait one full polling interval (2
> > >> > seconds), and in the worst case we wait two polling intervals (4
> > >> seconds).
> > >> >
> > >> > Has anyone looked into optimizing this at all? It seems like we
> could
> > >> have
> > >> > metadata changes trigger an immediate "collection" into the C++
> side,
> > >> and
> > >> > have the statestore update callback wait ("long poll" style) for an
> > >> update
> > >> > rather than skip if there is nothing available.
> > >> >
> > >> > -Todd
> > >> > --
> > >> > Todd Lipcon
> > >> > Software Engineer, Cloudera
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > Todd Lipcon
> > > Software Engineer, Cloudera
> > >
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
> --
> Tianyi Wang
>

Re: Improving latency of catalog update propagation?

Posted by Tianyi Wang <tw...@cloudera.com>.

GetCatalogDelta used to block catalogd from executing DDLs and the pending
struct was yet another cache to smooth things a little.

On Tue, Aug 21, 2018 at 11:28 AM Todd Lipcon <to...@cloudera.com> wrote:

> One more parting thought: why don't we just call 'GetCatalogDelta()'
> directly from the catalog callback in order to do a direct handoff, instead
> of storing them in this 'pending' struct? Given the statestore uses a
> dedicated thread per subscriber (right?) it seems like it would be fine for
> the update callback to take a long time, no?
>
> -Todd
>
> On Tue, Aug 21, 2018 at 11:09 AM, Todd Lipcon <to...@cloudera.com> wrote:
>
> > Thanks, Tim. I'm guessing once we switch over these RPCs to KRPC instead
> > of Thrift we'll alleviate some of the scalability issues and maybe we can
> > look into increasing frequency or doing a "push" to the statestore, etc.
> I
> > probably won't work on this in the near term to avoid complicating the
> > ongoing changes with catalog.
> >
> > -Todd
> >
> > On Tue, Aug 21, 2018 at 10:22 AM, Tim Armstrong <tarmstrong@cloudera.com
> >
> > wrote:
> >
> >> This is somewhat relevant for admission control too - I had thought
> about
> >> some of these issues in that context, because reducing the latency of
> >> admission controls state propagation helps avoid overadmission but
> having
> >> a
> >> very low statestore frequency is very inefficient and doesn't scale well
> >> to
> >> larger clusters.
> >>
> >> For the catalog updates I agree we could do something with long polls
> >> since
> >> it's a single producer so that the "idle" state of the system has a
> thread
> >> sitting in the update callback on catalogd waiting for an update.
> >>
> >> I'd also thought at one point about allowing subscribers to notify the
> >> statestore that they had something to add to the topic. That could be
> >> treated as a hint to the statestore to schedule the subscriber update
> >> sooner. This would also work for admission control since coordinators
> >> could
> >> notify the statestore when the first query was admitted after the
> previous
> >> statestore update.
> >>
> >> On Tue, Aug 21, 2018 at 9:41 AM, Todd Lipcon <to...@cloudera.com> wrote:
> >>
> >> > Hey folks,
> >> >
> >> > In my recent forays into the catalog->statestore->impalad metadata
> >> > propagation code base, I noticed that the latency of any update is
> >> > typically between 2-4 seconds with the standard 2-second statestore
> >> polling
> >> > interval. That's because the code currently works as follows:
> >> >
> >> > 1. in the steady state with no recent metadata changes, the catalogd's
> >> > state is:
> >> > -- topic_updates_ready_ = true
> >> > -- pending_topic_updates_ = empty
> >> >
> >> > 2. some metadata change happens, which modifies the version numbers in
> >> the
> >> > Java catalog but doesn't modify any of the C++ side state
> >> >
> >> > 3. the next statestore poll happens due to the normal interval
> >> expiring. On
> >> > average, this will take *1/2 the polling interval*
> >> > -- this sees that pending_topic_updates_ is empty, so returns no
> >> results.
> >> > -- it sets topic_updates_ready_ = false and triggers the "gather"
> thread
> >> >
> >> > 4. the "gather" thread wakes up and gathers updates, filling in
> >> > 'pending_topic_updates_' and setting 'topic_updates_ready_' back to
> true
> >> > (typically subsecond in smallish catalogs, so this happens before the
> >> next
> >> > poll)
> >> >
> >> > 5. wait *another full statestore polling interval* (2 seconds) after
> >> step
> >> > #3 above, at which point we deliver the metadata update to the
> >> statestore
> >> >
> >> > 6. wait on average* 1/2 the polling interval* until any particular
> >> impalad
> >> > gets the update from #4
> >> >
> >> > So. in the absolute best case, we wait one full polling interval (2
> >> > seconds), and in the worst case we wait two polling intervals (4
> >> seconds).
> >> >
> >> > Has anyone looked into optimizing this at all? It seems like we could
> >> have
> >> > metadata changes trigger an immediate "collection" into the C++ side,
> >> and
> >> > have the statestore update callback wait ("long poll" style) for an
> >> update
> >> > rather than skip if there is nothing available.
> >> >
> >> > -Todd
> >> > --
> >> > Todd Lipcon
> >> > Software Engineer, Cloudera
> >> >
> >>
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
-- 
Tianyi Wang

Re: Improving latency of catalog update propagation?

Posted by Todd Lipcon <to...@cloudera.com>.

One more parting thought: why don't we just call 'GetCatalogDelta()'
directly from the catalog callback in order to do a direct handoff, instead
of storing them in this 'pending' struct? Given the statestore uses a
dedicated thread per subscriber (right?) it seems like it would be fine for
the update callback to take a long time, no?

-Todd

On Tue, Aug 21, 2018 at 11:09 AM, Todd Lipcon <to...@cloudera.com> wrote:

> Thanks, Tim. I'm guessing once we switch over these RPCs to KRPC instead
> of Thrift we'll alleviate some of the scalability issues and maybe we can
> look into increasing frequency or doing a "push" to the statestore, etc. I
> probably won't work on this in the near term to avoid complicating the
> ongoing changes with catalog.
>
> -Todd
>
> On Tue, Aug 21, 2018 at 10:22 AM, Tim Armstrong <ta...@cloudera.com>
> wrote:
>
>> This is somewhat relevant for admission control too - I had thought about
>> some of these issues in that context, because reducing the latency of
>> admission controls state propagation helps avoid overadmission but having
>> a
>> very low statestore frequency is very inefficient and doesn't scale well
>> to
>> larger clusters.
>>
>> For the catalog updates I agree we could do something with long polls
>> since
>> it's a single producer so that the "idle" state of the system has a thread
>> sitting in the update callback on catalogd waiting for an update.
>>
>> I'd also thought at one point about allowing subscribers to notify the
>> statestore that they had something to add to the topic. That could be
>> treated as a hint to the statestore to schedule the subscriber update
>> sooner. This would also work for admission control since coordinators
>> could
>> notify the statestore when the first query was admitted after the previous
>> statestore update.
>>
>> On Tue, Aug 21, 2018 at 9:41 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>
>> > Hey folks,
>> >
>> > In my recent forays into the catalog->statestore->impalad metadata
>> > propagation code base, I noticed that the latency of any update is
>> > typically between 2-4 seconds with the standard 2-second statestore
>> polling
>> > interval. That's because the code currently works as follows:
>> >
>> > 1. in the steady state with no recent metadata changes, the catalogd's
>> > state is:
>> > -- topic_updates_ready_ = true
>> > -- pending_topic_updates_ = empty
>> >
>> > 2. some metadata change happens, which modifies the version numbers in
>> the
>> > Java catalog but doesn't modify any of the C++ side state
>> >
>> > 3. the next statestore poll happens due to the normal interval
>> expiring. On
>> > average, this will take *1/2 the polling interval*
>> > -- this sees that pending_topic_updates_ is empty, so returns no
>> results.
>> > -- it sets topic_updates_ready_ = false and triggers the "gather" thread
>> >
>> > 4. the "gather" thread wakes up and gathers updates, filling in
>> > 'pending_topic_updates_' and setting 'topic_updates_ready_' back to true
>> > (typically subsecond in smallish catalogs, so this happens before the
>> next
>> > poll)
>> >
>> > 5. wait *another full statestore polling interval* (2 seconds) after
>> step
>> > #3 above, at which point we deliver the metadata update to the
>> statestore
>> >
>> > 6. wait on average* 1/2 the polling interval* until any particular
>> impalad
>> > gets the update from #4
>> >
>> > So. in the absolute best case, we wait one full polling interval (2
>> > seconds), and in the worst case we wait two polling intervals (4
>> seconds).
>> >
>> > Has anyone looked into optimizing this at all? It seems like we could
>> have
>> > metadata changes trigger an immediate "collection" into the C++ side,
>> and
>> > have the statestore update callback wait ("long poll" style) for an
>> update
>> > rather than skip if there is nothing available.
>> >
>> > -Todd
>> > --
>> > Todd Lipcon
>> > Software Engineer, Cloudera
>> >
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Improving latency of catalog update propagation?

Posted by Todd Lipcon <to...@cloudera.com>.

Thanks, Tim. I'm guessing once we switch over these RPCs to KRPC instead of
Thrift we'll alleviate some of the scalability issues and maybe we can look
into increasing frequency or doing a "push" to the statestore, etc. I
probably won't work on this in the near term to avoid complicating the
ongoing changes with catalog.

-Todd

On Tue, Aug 21, 2018 at 10:22 AM, Tim Armstrong <ta...@cloudera.com>
wrote:

> This is somewhat relevant for admission control too - I had thought about
> some of these issues in that context, because reducing the latency of
> admission controls state propagation helps avoid overadmission but having a
> very low statestore frequency is very inefficient and doesn't scale well to
> larger clusters.
>
> For the catalog updates I agree we could do something with long polls since
> it's a single producer so that the "idle" state of the system has a thread
> sitting in the update callback on catalogd waiting for an update.
>
> I'd also thought at one point about allowing subscribers to notify the
> statestore that they had something to add to the topic. That could be
> treated as a hint to the statestore to schedule the subscriber update
> sooner. This would also work for admission control since coordinators could
> notify the statestore when the first query was admitted after the previous
> statestore update.
>
> On Tue, Aug 21, 2018 at 9:41 AM, Todd Lipcon <to...@cloudera.com> wrote:
>
> > Hey folks,
> >
> > In my recent forays into the catalog->statestore->impalad metadata
> > propagation code base, I noticed that the latency of any update is
> > typically between 2-4 seconds with the standard 2-second statestore
> polling
> > interval. That's because the code currently works as follows:
> >
> > 1. in the steady state with no recent metadata changes, the catalogd's
> > state is:
> > -- topic_updates_ready_ = true
> > -- pending_topic_updates_ = empty
> >
> > 2. some metadata change happens, which modifies the version numbers in
> the
> > Java catalog but doesn't modify any of the C++ side state
> >
> > 3. the next statestore poll happens due to the normal interval expiring.
> On
> > average, this will take *1/2 the polling interval*
> > -- this sees that pending_topic_updates_ is empty, so returns no results.
> > -- it sets topic_updates_ready_ = false and triggers the "gather" thread
> >
> > 4. the "gather" thread wakes up and gathers updates, filling in
> > 'pending_topic_updates_' and setting 'topic_updates_ready_' back to true
> > (typically subsecond in smallish catalogs, so this happens before the
> next
> > poll)
> >
> > 5. wait *another full statestore polling interval* (2 seconds) after step
> > #3 above, at which point we deliver the metadata update to the statestore
> >
> > 6. wait on average* 1/2 the polling interval* until any particular
> impalad
> > gets the update from #4
> >
> > So. in the absolute best case, we wait one full polling interval (2
> > seconds), and in the worst case we wait two polling intervals (4
> seconds).
> >
> > Has anyone looked into optimizing this at all? It seems like we could
> have
> > metadata changes trigger an immediate "collection" into the C++ side, and
> > have the statestore update callback wait ("long poll" style) for an
> update
> > rather than skip if there is nothing available.
> >
> > -Todd
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Improving latency of catalog update propagation?

Posted by Tim Armstrong <ta...@cloudera.com>.

This is somewhat relevant for admission control too - I had thought about
some of these issues in that context, because reducing the latency of
admission controls state propagation helps avoid overadmission but having a
very low statestore frequency is very inefficient and doesn't scale well to
larger clusters.

For the catalog updates I agree we could do something with long polls since
it's a single producer so that the "idle" state of the system has a thread
sitting in the update callback on catalogd waiting for an update.

I'd also thought at one point about allowing subscribers to notify the
statestore that they had something to add to the topic. That could be
treated as a hint to the statestore to schedule the subscriber update
sooner. This would also work for admission control since coordinators could
notify the statestore when the first query was admitted after the previous
statestore update.

On Tue, Aug 21, 2018 at 9:41 AM, Todd Lipcon <to...@cloudera.com> wrote:

> Hey folks,
>
> In my recent forays into the catalog->statestore->impalad metadata
> propagation code base, I noticed that the latency of any update is
> typically between 2-4 seconds with the standard 2-second statestore polling
> interval. That's because the code currently works as follows:
>
> 1. in the steady state with no recent metadata changes, the catalogd's
> state is:
> -- topic_updates_ready_ = true
> -- pending_topic_updates_ = empty
>
> 2. some metadata change happens, which modifies the version numbers in the
> Java catalog but doesn't modify any of the C++ side state
>
> 3. the next statestore poll happens due to the normal interval expiring. On
> average, this will take *1/2 the polling interval*
> -- this sees that pending_topic_updates_ is empty, so returns no results.
> -- it sets topic_updates_ready_ = false and triggers the "gather" thread
>
> 4. the "gather" thread wakes up and gathers updates, filling in
> 'pending_topic_updates_' and setting 'topic_updates_ready_' back to true
> (typically subsecond in smallish catalogs, so this happens before the next
> poll)
>
> 5. wait *another full statestore polling interval* (2 seconds) after step
> #3 above, at which point we deliver the metadata update to the statestore
>
> 6. wait on average* 1/2 the polling interval* until any particular impalad
> gets the update from #4
>
> So. in the absolute best case, we wait one full polling interval (2
> seconds), and in the worst case we wait two polling intervals (4 seconds).
>
> Has anyone looked into optimizing this at all? It seems like we could have
> metadata changes trigger an immediate "collection" into the C++ side, and
> have the statestore update callback wait ("long poll" style) for an update
> rather than skip if there is nothing available.
>
> -Todd
> --
> Todd Lipcon
> Software Engineer, Cloudera
>