You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Piotr Żukowski <pi...@gmail.com> on 2022/04/07 12:58:44 UTC

Kudu+Impala without HMS?

Hi!

I've asked the question below in the Kudu slack. I've been redirected to
your mail.

Question:

*Hi guys! I'm new to Kudu so this question may be dumb, but i can't find
answer anywhere.I'll be designing a data mart(dimensional schema) in Kudu
for my thesis. I found out that I need Apache Impala to have SQL-like
interface, and to connect Kudu to visualization software like PowerBI or
Superset. I'm using "normal" kudu not Cloudera version.**Question is: do i
need to install Hive and metastore if I only want to access Kudu via
Impala? Kudu is separate data storage so, if i understand all these
technologies correctly, Hive and metastore are redundant since data will
not be stored in HDFS.  Because of that, I intuitively feel that there is a
solution that omits hive and metastore installation.*
*Thanks in advance for answers. Sorry ,if something isn't clear enough. Let
me know - I will clarify it as best as I can.*
Thanks in advance for the answer :)
Piotr

Re: Kudu+Impala without HMS?

Posted by Zoltán Borók-Nagy <bo...@cloudera.com>.
Hi Piotr,

Thank you for reaching out. The answer is yes, you need to install a Hive
Metastore to use Impala. But you don't need other parts of Hive, e.g.
Hiveserver2.
Hive Metastore stores table metadata for all kinds of tables, not just
tables stored on a filesystem/object store.
Impala knows what tables exist by asking the Hive Metastore. Impala doesn't
know about the existence of Kudu tables if there is no information about
them in the HMS.

Cheers,
    Zoltan



On Thu, Apr 7, 2022 at 3:35 PM Piotr Żukowski <pi...@gmail.com> wrote:

> Hi!
>
> I've asked the question below in the Kudu slack. I've been redirected to
> your mail.
>
> Question:
>
> *Hi guys! I'm new to Kudu so this question may be dumb, but i can't find
> answer anywhere.I'll be designing a data mart(dimensional schema) in Kudu
> for my thesis. I found out that I need Apache Impala to have SQL-like
> interface, and to connect Kudu to visualization software like PowerBI or
> Superset. I'm using "normal" kudu not Cloudera version.**Question is: do i
> need to install Hive and metastore if I only want to access Kudu via
> Impala? Kudu is separate data storage so, if i understand all these
> technologies correctly, Hive and metastore are redundant since data will
> not be stored in HDFS.  Because of that, I intuitively feel that there is a
> solution that omits hive and metastore installation.*
> *Thanks in advance for answers. Sorry ,if something isn't clear enough. Let
> me know - I will clarify it as best as I can.*
> Thanks in advance for the answer :)
> Piotr
>

Re: Kudu+Impala without HMS?

Posted by Zoltán Borók-Nagy <bo...@cloudera.com.INVALID>.
Hi Piotr,

Thank you for reaching out. The answer is yes, you need to install a Hive
Metastore to use Impala. But you don't need other parts of Hive, e.g.
Hiveserver2.
Hive Metastore stores table metadata for all kinds of tables, not just
tables stored on a filesystem/object store.
Impala knows what tables exist by asking the Hive Metastore. Impala doesn't
know about the existence of Kudu tables if there is no information about
them in the HMS.

Cheers,
    Zoltan



On Thu, Apr 7, 2022 at 3:35 PM Piotr Żukowski <pi...@gmail.com> wrote:

> Hi!
>
> I've asked the question below in the Kudu slack. I've been redirected to
> your mail.
>
> Question:
>
> *Hi guys! I'm new to Kudu so this question may be dumb, but i can't find
> answer anywhere.I'll be designing a data mart(dimensional schema) in Kudu
> for my thesis. I found out that I need Apache Impala to have SQL-like
> interface, and to connect Kudu to visualization software like PowerBI or
> Superset. I'm using "normal" kudu not Cloudera version.**Question is: do i
> need to install Hive and metastore if I only want to access Kudu via
> Impala? Kudu is separate data storage so, if i understand all these
> technologies correctly, Hive and metastore are redundant since data will
> not be stored in HDFS.  Because of that, I intuitively feel that there is a
> solution that omits hive and metastore installation.*
> *Thanks in advance for answers. Sorry ,if something isn't clear enough. Let
> me know - I will clarify it as best as I can.*
> Thanks in advance for the answer :)
> Piotr
>