You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@impala.apache.org by Fawze Abujaber <fa...@gmail.com> on 2018/09/21 12:32:10 UTC

Slow DDL

Hello Community,

I'm investigating an issue we are running on with impala DDL statements
which sometimes took more than 6-9 minutes.

We have around 144 impala tables that partitioned by YYYY/MM/DD
We are keeping the data between 3-13 months depend on the table, we are
running 3 different DDL statements.

---> ALTER Table recover partitions each 20 minutes to detect the new data
generated by spark job and written into the HDFS.

---> DROP AND CREATE table twice a day to detect new schema changes in the
data.

I don't see the issue occuring on specific table or specific Impala daemon,

On the other side we have 450 hive tables that we running the same DDL
statements on using the hive.

Trying to find a way to investigate this with no success, for example i
want to check the size of the metadata stored at each daemon in order to
see if my issue related to the metadata size or not, but don't aware how to
check this.

Any suggestions on how to investigate this issue is much appreciated.

-- 
Take Care
Fawze Abujaber

Re: Slow DDL

Posted by Tim Armstrong <ta...@cloudera.com>.
It does sound a lot like https://issues.apache.org/jira/browse/IMPALA-5058
or https://issues.apache.org/jira/browse/IMPALA-6671 - the catalog tries to
maintain some kind of consistent for operations but that means that
long-running operations can end up blocking others. I'm not involved but I
know some other people are rearchitecting parts of the catalog to avoid
issues like this.

On Fri, Sep 21, 2018 at 5:32 AM Fawze Abujaber <fa...@gmail.com> wrote:

> Hello Community,
>
> I'm investigating an issue we are running on with impala DDL statements
> which sometimes took more than 6-9 minutes.
>
> We have around 144 impala tables that partitioned by YYYY/MM/DD
> We are keeping the data between 3-13 months depend on the table, we are
> running 3 different DDL statements.
>
> ---> ALTER Table recover partitions each 20 minutes to detect the new data
> generated by spark job and written into the HDFS.
>
> ---> DROP AND CREATE table twice a day to detect new schema changes in the
> data.
>
> I don't see the issue occuring on specific table or specific Impala daemon,
>
> On the other side we have 450 hive tables that we running the same DDL
> statements on using the hive.
>
> Trying to find a way to investigate this with no success, for example i
> want to check the size of the metadata stored at each daemon in order to
> see if my issue related to the metadata size or not, but don't aware how to
> check this.
>
> Any suggestions on how to investigate this issue is much appreciated.
>
> --
> Take Care
> Fawze Abujaber
>