You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Marcel Kornacker <ma...@gmail.com> on 2018/06/07 16:27:29 UTC

Re: Proposal for a new approach to Impala metadata

Hey Todd,

thanks for putting that together, that area certainly needs some work.
I think there are a number of good ideas in the proposal, and I also
think there are a number of ways to augment it to make it even better:
1. The size of the cache is obviously a result of the caching
granularity (or lack thereof), and switching to a bounded size with
more granularity seems like a good choice.
2. However, one of the biggest usability problems stems from the need
for manual metadata discovery (via Invalidate/Refresh). Introducing a
TTL for cached objects seems like a bad approach, because it creates a
conflict between metadata freshness and caching
effectiveness/performance. This isn't really something that users want
to have to trade off.
3. Removing the role of a central "metadata curator" effectively
closes the door on doing automated metadata discovery in the future
(which in turn seems like a prerequisite for services like automated
format conversion, ie, anything that needs to react to the presence of
new data).

To address those concerns, I am proposing the following changes:
1. Keep the role of a central metadata curator (let's call it metad
just for the sake of the argument, so as not to confuse it with the
existing catalogd).
2. Partition the namespace in some way (for instance, into a fixed
number of topics, with each  database assigned to a topic) and allow
coordinators to join and drop topics as needed in order to stay within
their cache size bounds.
3. The improvements you suggest for streamlining the code, avoiding
in-place updates, and making metadata retrieval more efficient still
apply here, they're just executed by a different process.

On Tue, May 22, 2018 at 9:27 PM, Todd Lipcon <to...@cloudera.com> wrote:
> Hey Impala devs,
>
> Over the past 3 weeks I have been investigating various issues with
> Impala's treatment of metadata. Based on data from a number of user
> deployments, and after discussing the issues with a number of Impala
> contributors and committers, I've come up with a proposal for a new design.
> I've also developed a prototype to show that the approach is workable and
> is likely to achieve its goals.
>
> Rather than describe the design in duplicate, I've written up a proposal
> document here:
> https://docs.google.com/document/d/1WcUQ7nC3fzLFtZLofzO6kvWdGHFaaqh97fC_PvqVGCk/edit?ts=5b04a6b8#
>
> Please take a look and provide any input, questions, or concerns.
>
> Additionally, if any users on this list have experienced metadata-related
> problems in the past and would be willing to assist in testing or
> contribute workloads, please feel free to respond to me either on or off
> list.
>
> Thanks
> Todd

Re: Proposal for a new approach to Impala metadata

Posted by Todd Lipcon <to...@cloudera.com.INVALID>.
Hi Marcel,

Sorry for the slow response, I was out of the office for a short vacation.
Comments inline:

On Sat, Jun 30, 2018 at 5:07 PM, Marcel Kornacker <ma...@gmail.com> wrote:

> Responses/comments inline.
>
> Before those, a note about process:
> It looks like the work on this proposal is already underway. However, the
> doc you sent out still has very many unanswered questions, one of which is
> "to what extent will these changes deteriorate existing use cases that are
> currently well supported".
>

I thought I'd answered that one, but just to be clear: we do not want to
deteroriate existing use cases that are well supported. The goal is to not
break any existing use cases, and we expect to substantially improve the
performance and reliability for many, as well as enable new larger-scale
use cases than possible with the current design.


> I think it is very important to spell out the goals and non-goals of this
> proposal explicitly, preferably in a separate section, so that there can be
> a meaningful discussion of those in the Impala community.
>
>
Right, sorry that I didn't get a chance to do this after we'd discussed it
previously. I will prioritize that this week.


> The proposal lists 6 contributors, all of whom are/were Cloudera employees
> at the time of writing. It appears that there was a fair amount of
> discussion and experimentation that went into this proposal, but this
> wasn't done publicly.
>

Indeed I did discuss the proposal with some colleagues prior to publishing
the doc on the mailing list. However, the doc itself was only started the
day before it was published, and I solicited a few comments from coworkers
as more of a "proof read". I personally wrote the document as well as
personally did the prototype work discussed therein. I wanted to be
inclusive of the other folks who had helped with the doc proof-reading so I
included them in the heading. I'll remove all of the names including my own
now, since the doc should be seen as a product of the open source community
rather than any particular contributor(s).

This can create the impression that community involvement is seen as an
> afterthought rather than an integral part of the process (and the warning
> in red at the top that 'this document is public' doesn't really do anything
> to counter that impression :)).
>
>
Sure, I can see that. I figured that such a warning would be appropriate
because the document discusses a lot of workload-specific data, and I
didn't want anyone to comment "In my work at FooCorp we have a table
storing info about our 12,300 customers." Such a comment might be by a
fellow Cloudera employee, or by someone at some other contributor. Happy to
remove that heading if it seems like it's not inclusive.


-Todd



> On Fri, Jun 8, 2018 at 9:54 AM, Todd Lipcon <to...@cloudera.com> wrote:
>
>> On Thu, Jun 7, 2018 at 9:27 AM, Marcel Kornacker <ma...@gmail.com>
>> wrote:
>>
>>> Hey Todd,
>>>
>>> thanks for putting that together, that area certainly needs some work.
>>> I think there are a number of good ideas in the proposal, and I also
>>> think there are a number of ways to augment it to make it even better:
>>> 1. The size of the cache is obviously a result of the caching
>>> granularity (or lack thereof), and switching to a bounded size with
>>> more granularity seems like a good choice.
>>> 2. However, one of the biggest usability problems stems from the need
>>> for manual metadata discovery (via Invalidate/Refresh). Introducing a
>>> TTL for cached objects seems like a bad approach, because it creates a
>>> conflict between metadata freshness and caching
>>> effectiveness/performance. This isn't really something that users want
>>> to have to trade off.
>>>
>>
>> Agreed. In the "future directions" part of the document I included one
>> item about removing the need for invalidate/refresh, perhaps by subscribing
>> to the push notification mechanisms provided by HMS/NN/S3/etc. With those
>> notification mechanisms, the cache TTL could be dialed way up (perhaps to
>> infinity) and only rely on cache capacity for eviction.
>>
>> In some cases the notification mechanisms provide enough information to
>> update the cache in place (i.e the notification has the new information)
>> whereas in others it would just contain enough information to perform
>> invalidation. But from the user's perspective it achieves near-real-time
>> freshness either way.
>>
>> Note, though, that, absent a transactional coordination between Impala
>> and source systems, we'll always be in the realm of "soft guarantees". That
>> is to say, the notification mechanisms are all asynchronous and could be
>> backlogged due to a number of issues out of our control (eg momentary
>> network congestion between HMS and Impala, or unknown issues on the S3/SNS
>> backend). As you pointed out in the doc, "soft guarantees" are somewhat
>> weak when you come at it from the perspective of an application developer.
>> So, even with those mechanisms, we may need some explicit command like
>> "sync metadata;" or somesuch that acts like a memory barrier.
>>
>> I think trying to bite all of that off in one go will increase the scope
>> of the project quite a bit, though. Did you have some easier mechanism in
>> mind to improve usability here that I'm missing?
>>
>> 3. Removing the role of a central "metadata curator" effectively
>>> closes the door on doing automated metadata discovery in the future
>>> (which in turn seems like a prerequisite for services like automated
>>> format conversion, ie, anything that needs to react to the presence of
>>> new data).
>>>
>>
>> That's fair. Something like catalogd may still have its place to
>> centralize the subscription to the notification services and re-broadcast
>> them to the cluster as necessary. My thinking was that it looks
>> sufficiently different from the current catalogd, though, that it would be
>> a new component rather than an evolution of the existing
>> catalogd-statestore-impalad metadata propagation protocol.
>>
>>
>>> To address those concerns, I am proposing the following changes:
>>> 1. Keep the role of a central metadata curator (let's call it metad
>>> just for the sake of the argument, so as not to confuse it with the
>>> existing catalogd).
>>>
>>
>> Per above, I think this could be useful.
>>
>>
>>> 2. Partition the namespace in some way (for instance, into a fixed
>>> number of topics, with each  database assigned to a topic) and allow
>>> coordinators to join and drop topics as needed in order to stay within
>>> their cache size bounds.
>>>
>>
>> I don't think partitioning to the granularity of a database is
>> sufficient. We've found that even individual single tables can cause
>> metadata updates in excess of 1GB, even though those tables are
>> infrequently accessed, and when accessed, often only a very small subset of
>> partitions. Often these mega tables live in the same database as the rest
>> of the tables accessed by a workload, and asking users to manually
>> partition them out brings physical implementation concerns bleeding into
>> logical schema design decisions. That's quite a hassle when considering
>> things like security policies being enforced on a per-database basis, right?
>>
>
> So then partition into a fixed number of topics and assign each *table* to
> one? Or are you saying that caching granularity should be below the table
> level? (Note that the actual granularity isn't spelled out in the
> proposal.)
>
>
>>
>> 3. The improvements you suggest for streamlining the code, avoiding
>>> in-place updates, and making metadata retrieval more efficient still
>>> apply here, they're just executed by a different process.
>>>
>>
>> I think a critical piece of the design, though, is that the incremental
>> and fine-grained metadata retrieval extends all the way down to the
>> impalad, and not just to the centralized coordinator. If we are pushing
>> metadata at database (or even table) granularity from the catalog to the
>> impalads, we won't reap the major benefits described here.
>>
>
> Why would caching at the table level not work? Is the assumption that
> there are lots of very large and very infrequently accessed tables around,
> so that they would destroy the cache hit rate?
>
> It would be good to get clarity on these scenarios. It feels like the
> proposal is being targeted at some very specific use cases, but these
> haven't been spelled out clearly. If these are extreme use cases
> characterized by catalogs that are several orders of magnitude larger than
> what an average catalog might look like, it would be good to recognize that
> and be cognizant to what extent more middle-of-the-road use cases will get
> impacted.
>
>
>>
>> Here's another approach we could consider long term:
>> - keep a central daemon responsible for interacting with source systems
>> - impalads dont directly access source systems, but instead send metadata
>> fetch requests to the central coordinator
>> - impalads maintain a small cache of their own, and the central
>> coordinator maintains a larger cache. If the impalad requests some info
>> that the central daemon doesn't have, it fetches on demand from the source.
>> - central coordinator propagates invalidations to impalads
>>
>> In essence, it's like the current catalog design, but based on a
>> fine-grained "pull" design instead of a course-grained "push/broadcast"
>> design.
>>
>
> The problem with 'pull' designs is that they don't work well with
> fine-grained metadata discovery (e.g., a new file arrived; your proposal
> sounds like that would require an invalidation either at the partition or
> the full table level, in the case of an unpartitioned table). Given that
> Refresh/Invalidate are one of the most problematic aspects of running
> Impala in a production system, it would be good to evolve the handling of
> metadata in a direction that doesn't preclude fine-grained and automated
> metadata propagation in the future.
>
>
>>
>> This allows impalads to scale out without linearly increasing the amount
>> of load on source systems. The downsides, though, are:
>> - extra hop between impalad and source system on a cache miss
>> - extra complexity in failure cases (source system becomes a SPOF, and if
>> we replicate it we have more complex consistency to worry about)
>> - scalability benefits may only be really important on large clusters.
>> Small clusters can often be served sufficiently by just one or two
>> coordinator impalads.
>>
>> -Todd
>>
>>
>>> On Tue, May 22, 2018 at 9:27 PM, Todd Lipcon <to...@cloudera.com> wrote:
>>> > Hey Impala devs,
>>> >
>>> > Over the past 3 weeks I have been investigating various issues with
>>> > Impala's treatment of metadata. Based on data from a number of user
>>> > deployments, and after discussing the issues with a number of Impala
>>> > contributors and committers, I've come up with a proposal for a new
>>> design.
>>> > I've also developed a prototype to show that the approach is workable
>>> and
>>> > is likely to achieve its goals.
>>> >
>>> > Rather than describe the design in duplicate, I've written up a
>>> proposal
>>> > document here:
>>> > https://docs.google.com/document/d/1WcUQ7nC3fzLFtZLofzO6kvWd
>>> GHFaaqh97fC_PvqVGCk/edit?ts=5b04a6b8#
>>> >
>>> > Please take a look and provide any input, questions, or concerns.
>>> >
>>> > Additionally, if any users on this list have experienced
>>> metadata-related
>>> > problems in the past and would be willing to assist in testing or
>>> > contribute workloads, please feel free to respond to me either on or
>>> off
>>> > list.
>>> >
>>> > Thanks
>>> > Todd
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Proposal for a new approach to Impala metadata

Posted by Marcel Kornacker <ma...@gmail.com>.
Responses/comments inline.

Before those, a note about process:
It looks like the work on this proposal is already underway. However, the
doc you sent out still has very many unanswered questions, one of which is
"to what extent will these changes deteriorate existing use cases that are
currently well supported".
I think it is very important to spell out the goals and non-goals of this
proposal explicitly, preferably in a separate section, so that there can be
a meaningful discussion of those in the Impala community.

The proposal lists 6 contributors, all of whom are/were Cloudera employees
at the time of writing. It appears that there was a fair amount of
discussion and experimentation that went into this proposal, but this
wasn't done publicly. This can create the impression that community
involvement is seen as an afterthought rather than an integral part of the
process (and the warning in red at the top that 'this document is public'
doesn't really do anything to counter that impression :)).

On Fri, Jun 8, 2018 at 9:54 AM, Todd Lipcon <to...@cloudera.com> wrote:

> On Thu, Jun 7, 2018 at 9:27 AM, Marcel Kornacker <ma...@gmail.com>
> wrote:
>
>> Hey Todd,
>>
>> thanks for putting that together, that area certainly needs some work.
>> I think there are a number of good ideas in the proposal, and I also
>> think there are a number of ways to augment it to make it even better:
>> 1. The size of the cache is obviously a result of the caching
>> granularity (or lack thereof), and switching to a bounded size with
>> more granularity seems like a good choice.
>> 2. However, one of the biggest usability problems stems from the need
>> for manual metadata discovery (via Invalidate/Refresh). Introducing a
>> TTL for cached objects seems like a bad approach, because it creates a
>> conflict between metadata freshness and caching
>> effectiveness/performance. This isn't really something that users want
>> to have to trade off.
>>
>
> Agreed. In the "future directions" part of the document I included one
> item about removing the need for invalidate/refresh, perhaps by subscribing
> to the push notification mechanisms provided by HMS/NN/S3/etc. With those
> notification mechanisms, the cache TTL could be dialed way up (perhaps to
> infinity) and only rely on cache capacity for eviction.
>
> In some cases the notification mechanisms provide enough information to
> update the cache in place (i.e the notification has the new information)
> whereas in others it would just contain enough information to perform
> invalidation. But from the user's perspective it achieves near-real-time
> freshness either way.
>
> Note, though, that, absent a transactional coordination between Impala and
> source systems, we'll always be in the realm of "soft guarantees". That is
> to say, the notification mechanisms are all asynchronous and could be
> backlogged due to a number of issues out of our control (eg momentary
> network congestion between HMS and Impala, or unknown issues on the S3/SNS
> backend). As you pointed out in the doc, "soft guarantees" are somewhat
> weak when you come at it from the perspective of an application developer.
> So, even with those mechanisms, we may need some explicit command like
> "sync metadata;" or somesuch that acts like a memory barrier.
>
> I think trying to bite all of that off in one go will increase the scope
> of the project quite a bit, though. Did you have some easier mechanism in
> mind to improve usability here that I'm missing?
>
> 3. Removing the role of a central "metadata curator" effectively
>> closes the door on doing automated metadata discovery in the future
>> (which in turn seems like a prerequisite for services like automated
>> format conversion, ie, anything that needs to react to the presence of
>> new data).
>>
>
> That's fair. Something like catalogd may still have its place to
> centralize the subscription to the notification services and re-broadcast
> them to the cluster as necessary. My thinking was that it looks
> sufficiently different from the current catalogd, though, that it would be
> a new component rather than an evolution of the existing
> catalogd-statestore-impalad metadata propagation protocol.
>
>
>> To address those concerns, I am proposing the following changes:
>> 1. Keep the role of a central metadata curator (let's call it metad
>> just for the sake of the argument, so as not to confuse it with the
>> existing catalogd).
>>
>
> Per above, I think this could be useful.
>
>
>> 2. Partition the namespace in some way (for instance, into a fixed
>> number of topics, with each  database assigned to a topic) and allow
>> coordinators to join and drop topics as needed in order to stay within
>> their cache size bounds.
>>
>
> I don't think partitioning to the granularity of a database is sufficient.
> We've found that even individual single tables can cause metadata updates
> in excess of 1GB, even though those tables are infrequently accessed, and
> when accessed, often only a very small subset of partitions. Often these
> mega tables live in the same database as the rest of the tables accessed by
> a workload, and asking users to manually partition them out brings physical
> implementation concerns bleeding into logical schema design decisions.
> That's quite a hassle when considering things like security policies being
> enforced on a per-database basis, right?
>

So then partition into a fixed number of topics and assign each *table* to
one? Or are you saying that caching granularity should be below the table
level? (Note that the actual granularity isn't spelled out in the
proposal.)


>
> 3. The improvements you suggest for streamlining the code, avoiding
>> in-place updates, and making metadata retrieval more efficient still
>> apply here, they're just executed by a different process.
>>
>
> I think a critical piece of the design, though, is that the incremental
> and fine-grained metadata retrieval extends all the way down to the
> impalad, and not just to the centralized coordinator. If we are pushing
> metadata at database (or even table) granularity from the catalog to the
> impalads, we won't reap the major benefits described here.
>

Why would caching at the table level not work? Is the assumption that there
are lots of very large and very infrequently accessed tables around, so
that they would destroy the cache hit rate?

It would be good to get clarity on these scenarios. It feels like the
proposal is being targeted at some very specific use cases, but these
haven't been spelled out clearly. If these are extreme use cases
characterized by catalogs that are several orders of magnitude larger than
what an average catalog might look like, it would be good to recognize that
and be cognizant to what extent more middle-of-the-road use cases will get
impacted.


>
> Here's another approach we could consider long term:
> - keep a central daemon responsible for interacting with source systems
> - impalads dont directly access source systems, but instead send metadata
> fetch requests to the central coordinator
> - impalads maintain a small cache of their own, and the central
> coordinator maintains a larger cache. If the impalad requests some info
> that the central daemon doesn't have, it fetches on demand from the source.
> - central coordinator propagates invalidations to impalads
>
> In essence, it's like the current catalog design, but based on a
> fine-grained "pull" design instead of a course-grained "push/broadcast"
> design.
>

The problem with 'pull' designs is that they don't work well with
fine-grained metadata discovery (e.g., a new file arrived; your proposal
sounds like that would require an invalidation either at the partition or
the full table level, in the case of an unpartitioned table). Given that
Refresh/Invalidate are one of the most problematic aspects of running
Impala in a production system, it would be good to evolve the handling of
metadata in a direction that doesn't preclude fine-grained and automated
metadata propagation in the future.


>
> This allows impalads to scale out without linearly increasing the amount
> of load on source systems. The downsides, though, are:
> - extra hop between impalad and source system on a cache miss
> - extra complexity in failure cases (source system becomes a SPOF, and if
> we replicate it we have more complex consistency to worry about)
> - scalability benefits may only be really important on large clusters.
> Small clusters can often be served sufficiently by just one or two
> coordinator impalads.
>
> -Todd
>
>
>> On Tue, May 22, 2018 at 9:27 PM, Todd Lipcon <to...@cloudera.com> wrote:
>> > Hey Impala devs,
>> >
>> > Over the past 3 weeks I have been investigating various issues with
>> > Impala's treatment of metadata. Based on data from a number of user
>> > deployments, and after discussing the issues with a number of Impala
>> > contributors and committers, I've come up with a proposal for a new
>> design.
>> > I've also developed a prototype to show that the approach is workable
>> and
>> > is likely to achieve its goals.
>> >
>> > Rather than describe the design in duplicate, I've written up a proposal
>> > document here:
>> > https://docs.google.com/document/d/1WcUQ7nC3fzLFtZLofzO6kvWd
>> GHFaaqh97fC_PvqVGCk/edit?ts=5b04a6b8#
>> >
>> > Please take a look and provide any input, questions, or concerns.
>> >
>> > Additionally, if any users on this list have experienced
>> metadata-related
>> > problems in the past and would be willing to assist in testing or
>> > contribute workloads, please feel free to respond to me either on or off
>> > list.
>> >
>> > Thanks
>> > Todd
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Proposal for a new approach to Impala metadata

Posted by Todd Lipcon <to...@cloudera.com>.
On Thu, Jun 7, 2018 at 9:27 AM, Marcel Kornacker <ma...@gmail.com> wrote:

> Hey Todd,
>
> thanks for putting that together, that area certainly needs some work.
> I think there are a number of good ideas in the proposal, and I also
> think there are a number of ways to augment it to make it even better:
> 1. The size of the cache is obviously a result of the caching
> granularity (or lack thereof), and switching to a bounded size with
> more granularity seems like a good choice.
> 2. However, one of the biggest usability problems stems from the need
> for manual metadata discovery (via Invalidate/Refresh). Introducing a
> TTL for cached objects seems like a bad approach, because it creates a
> conflict between metadata freshness and caching
> effectiveness/performance. This isn't really something that users want
> to have to trade off.
>

Agreed. In the "future directions" part of the document I included one item
about removing the need for invalidate/refresh, perhaps by subscribing to
the push notification mechanisms provided by HMS/NN/S3/etc. With those
notification mechanisms, the cache TTL could be dialed way up (perhaps to
infinity) and only rely on cache capacity for eviction.

In some cases the notification mechanisms provide enough information to
update the cache in place (i.e the notification has the new information)
whereas in others it would just contain enough information to perform
invalidation. But from the user's perspective it achieves near-real-time
freshness either way.

Note, though, that, absent a transactional coordination between Impala and
source systems, we'll always be in the realm of "soft guarantees". That is
to say, the notification mechanisms are all asynchronous and could be
backlogged due to a number of issues out of our control (eg momentary
network congestion between HMS and Impala, or unknown issues on the S3/SNS
backend). As you pointed out in the doc, "soft guarantees" are somewhat
weak when you come at it from the perspective of an application developer.
So, even with those mechanisms, we may need some explicit command like
"sync metadata;" or somesuch that acts like a memory barrier.

I think trying to bite all of that off in one go will increase the scope of
the project quite a bit, though. Did you have some easier mechanism in mind
to improve usability here that I'm missing?

3. Removing the role of a central "metadata curator" effectively
> closes the door on doing automated metadata discovery in the future
> (which in turn seems like a prerequisite for services like automated
> format conversion, ie, anything that needs to react to the presence of
> new data).
>

That's fair. Something like catalogd may still have its place to centralize
the subscription to the notification services and re-broadcast them to the
cluster as necessary. My thinking was that it looks sufficiently different
from the current catalogd, though, that it would be a new component rather
than an evolution of the existing catalogd-statestore-impalad metadata
propagation protocol.


> To address those concerns, I am proposing the following changes:
> 1. Keep the role of a central metadata curator (let's call it metad
> just for the sake of the argument, so as not to confuse it with the
> existing catalogd).
>

Per above, I think this could be useful.


> 2. Partition the namespace in some way (for instance, into a fixed
> number of topics, with each  database assigned to a topic) and allow
> coordinators to join and drop topics as needed in order to stay within
> their cache size bounds.
>

I don't think partitioning to the granularity of a database is sufficient.
We've found that even individual single tables can cause metadata updates
in excess of 1GB, even though those tables are infrequently accessed, and
when accessed, often only a very small subset of partitions. Often these
mega tables live in the same database as the rest of the tables accessed by
a workload, and asking users to manually partition them out brings physical
implementation concerns bleeding into logical schema design decisions.
That's quite a hassle when considering things like security policies being
enforced on a per-database basis, right?

3. The improvements you suggest for streamlining the code, avoiding
> in-place updates, and making metadata retrieval more efficient still
> apply here, they're just executed by a different process.
>

I think a critical piece of the design, though, is that the incremental and
fine-grained metadata retrieval extends all the way down to the impalad,
and not just to the centralized coordinator. If we are pushing metadata at
database (or even table) granularity from the catalog to the impalads, we
won't reap the major benefits described here.

Here's another approach we could consider long term:
- keep a central daemon responsible for interacting with source systems
- impalads dont directly access source systems, but instead send metadata
fetch requests to the central coordinator
- impalads maintain a small cache of their own, and the central coordinator
maintains a larger cache. If the impalad requests some info that the
central daemon doesn't have, it fetches on demand from the source.
- central coordinator propagates invalidations to impalads

In essence, it's like the current catalog design, but based on a
fine-grained "pull" design instead of a course-grained "push/broadcast"
design.

This allows impalads to scale out without linearly increasing the amount of
load on source systems. The downsides, though, are:
- extra hop between impalad and source system on a cache miss
- extra complexity in failure cases (source system becomes a SPOF, and if
we replicate it we have more complex consistency to worry about)
- scalability benefits may only be really important on large clusters.
Small clusters can often be served sufficiently by just one or two
coordinator impalads.

-Todd


> On Tue, May 22, 2018 at 9:27 PM, Todd Lipcon <to...@cloudera.com> wrote:
> > Hey Impala devs,
> >
> > Over the past 3 weeks I have been investigating various issues with
> > Impala's treatment of metadata. Based on data from a number of user
> > deployments, and after discussing the issues with a number of Impala
> > contributors and committers, I've come up with a proposal for a new
> design.
> > I've also developed a prototype to show that the approach is workable and
> > is likely to achieve its goals.
> >
> > Rather than describe the design in duplicate, I've written up a proposal
> > document here:
> > https://docs.google.com/document/d/1WcUQ7nC3fzLFtZLofzO6kvWdGHFaa
> qh97fC_PvqVGCk/edit?ts=5b04a6b8#
> >
> > Please take a look and provide any input, questions, or concerns.
> >
> > Additionally, if any users on this list have experienced metadata-related
> > problems in the past and would be willing to assist in testing or
> > contribute workloads, please feel free to respond to me either on or off
> > list.
> >
> > Thanks
> > Todd
>



-- 
Todd Lipcon
Software Engineer, Cloudera