You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Antoni Ivanov <ai...@vmware.com> on 2017/04/05 04:24:26 UTC

Hi and a few Impala design questions :)

Hi, 
I've been reading on design of catalog service/statestore.
Mostly from White paper about Impala - http://cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf
I got it from Impala confluence wiki https://cwiki.apache.org/confluence/display/IMPALA/Impala+Presentations%2C+Papers+and+Blog+Posts
It\u2019s rather interesting \u2013 it has fairly detailed (but clear) design of different components
 
Are there other sources (except the source code)?

Question 2: I\u2019ve been wondering does Impalad caches files location itself \u2013 they don\u2019t seem
to be stored in hive metastatore. Just the partition location is there,
right?


Re: Hi and a few Impala design questions :)

Posted by Lars Volker <lv...@cloudera.com>.
There is also this page, which has another paper published by the Impala
team, as well as other related materials:
https://cwiki.apache.org/confluence/display/IMPALA/Impala+Reading+List


On Wed, Apr 5, 2017 at 7:02 PM, Dimitris Tsirogiannis <
dtsirogiannis@cloudera.com> wrote:

> Hi Antoni,
>
> Regarding question 2. The catalog server collects file metadata, including
> block locations from the HDFS NameNode and caches them in memory. Overtime,
> file metadata are broadcast using the statestore to all the Impala servers
> and stored in their local metadata caches.
>
> Dimitris
>
> On Tue, Apr 4, 2017 at 9:24 PM, Antoni Ivanov <ai...@vmware.com> wrote:
>
> > Hi,
> > I've been reading on design of catalog service/statestore.
> > Mostly from White paper about Impala - http://cidrdb.org/cidr2015/
> > Papers/CIDR15_Paper28.pdf
> > I got it from Impala confluence wiki https://cwiki.apache.org/
> > confluence/display/IMPALA/Impala+Presentations%2C+Papers+and+Blog+Posts
> > It’s rather interesting – it has fairly detailed (but clear) design of
> > different components
> >
> > Are there other sources (except the source code)?
> >
> > Question 2: I’ve been wondering does Impalad caches files location itself
> > – they don’t seem
> > to be stored in hive metastatore. Just the partition location is there,
> > right?
> >
> >
>

Re: Hi and a few Impala design questions :)

Posted by Dimitris Tsirogiannis <dt...@cloudera.com>.
Hi Antoni,

Regarding question 2. The catalog server collects file metadata, including
block locations from the HDFS NameNode and caches them in memory. Overtime,
file metadata are broadcast using the statestore to all the Impala servers
and stored in their local metadata caches.

Dimitris

On Tue, Apr 4, 2017 at 9:24 PM, Antoni Ivanov <ai...@vmware.com> wrote:

> Hi,
> I've been reading on design of catalog service/statestore.
> Mostly from White paper about Impala - http://cidrdb.org/cidr2015/
> Papers/CIDR15_Paper28.pdf
> I got it from Impala confluence wiki https://cwiki.apache.org/
> confluence/display/IMPALA/Impala+Presentations%2C+Papers+and+Blog+Posts
> It’s rather interesting – it has fairly detailed (but clear) design of
> different components
>
> Are there other sources (except the source code)?
>
> Question 2: I’ve been wondering does Impalad caches files location itself
> – they don’t seem
> to be stored in hive metastatore. Just the partition location is there,
> right?
>
>