You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Iulian Mongescu <iu...@grx.ro> on 2019/05/15 17:33:50 UTC

Any HIVE DDL statement takes minutes to execute

Hello,

I'm working on a HDP-2.6.5.0 cluster with kerberos enabled and I have a problem with hive as any DDL statement that I run takes minutes to execute but any DML run in normal limits. I checked the logs but I didn't find anything that seems related with this problem and I would appreciate any help to debug this issue.

Please find bellow some examples with DDL&DML queries and their durations:


------------------------------------

0: jdbc:hive2://hdpx03:10000/> CREATE EXTERNAL TABLE IF NOT EXISTS agenti1 (...) STORED AS ORC LOCATION '/staging/core/agenti/2019-03-18/29d52a54eecae3731b31a3d6ef45d012';

No rows affected (184.191 seconds)

------------------------------------

0: jdbc:hive2://hdpx03:10000/> show tables;

+-----------+--+

| tab_name |

+-----------+--+

| agenti1 |

+-----------+--+

1 row selected (0.358 seconds)

---------------------------------

0: jdbc:hive2://hdpx03:10000/> select count(*) as total from agenti1 where 1;

INFO : Tez session hasn't been created yet. Opening session

INFO : Dag name: select count(*) as total from agenti1 wh...1(Stage-1)

INFO : Status: Running (Executing on YARN cluster with App id application_1552674174918_0002)



--------------------------------------------------------------------------------

VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED

--------------------------------------------------------------------------------

Map 1 .......... SUCCEEDED 1 1 0 0 0 0

Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0

--------------------------------------------------------------------------------

VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 5.48 s

--------------------------------------------------------------------------------

+--------+--+

| total |

+--------+--+

| 1960 |

+--------+--+

1 row selected (15.853 seconds)



-------------------------------------------------------

0: jdbc:hive2://hdpx03:10000/> drop table agenti1;

No rows affected (184.164 seconds)

--------------------------------------------------------

0: jdbc:hive2://hdpx03:10000/> CREATE EXTERNAL TABLE IF NOT EXISTS agenti1 (...) STORED AS ORC LOCATION '/staging/core/agenti/2019-03-18/29d52a54eecae3731b31a3d6ef45d012';

No rows affected (190.288 seconds)

Thanks,

Iulian


Re: Any HIVE DDL statement takes minutes to execute

Posted by "Proust (Feng Guizhou) [FDS Marketing]" <pf...@coupang.com>.
How about disable kerberos and give a try? The time cost looks like waiting for something to timeout, for example DNS reverse lookup.

________________________________
From: Alan Gates <al...@gmail.com>
Sent: Friday, May 17, 2019 8:05:04 AM
To: user@hive.apache.org
Subject: Re: Any HIVE DDL statement takes minutes to execute

[Warning]: This email originated from an external source. Do not open links or attachments unless you know the content is safe.
[경고]: 본 이메일은 회사 외부에서 유입되었습니다. 내용이 안전한지 확인하기 전까지는 링크나 첨부파일을 열지 마십시오.

I can think of two things that could take a long time in creating a table, database operations or file system operations.  The perf timers inside the metastore only measure the entire metadata operation, not the file part and the db part, so it will be hard to tell where the time is being spent.  When a table is first created the metastore prints a debug message to the logs that says "create_table" (you have to having logging set to DEBUG to see this).  This will tell you when the metastore started processing the create table.  Between creating the directory for the table and connecting to the RDBMS to create the entry for it, the createtime for the table is set.  A describe table extended should show you the create time of the table (or you can directly query the TBLS table in the RDBMS to find it as well).  Finally, when the metastore is done creating the table there is another entry in the log that starts with "create_table".  All three of these timestamps are generated on the same machine, so clock syncing won't be an issue.  These three timestamps should give you an idea of whether the majority of the time is being spent creating the directory or creating an entry in the database.

Which logs you need to look in to find the debug statements depends on whether you have a separate Hive Metastore Thrift service running or you have Hive Server2 directly communicating with the RDBMS.

Alan.

On Thu, May 16, 2019 at 1:42 AM Iulian Mongescu <iu...@grx.ro>> wrote:

Hi Mich,



First thank  you for taking the time to look over this problem. Now regarding the questions :



  1.  I can confirm there are no locks on metastore DB ;
  2.  About duration of the queries, in my previous mail I just gave some examples and I  can confirm that I run those queries on the metastore db server and also from the hive node that I’m using to test and the results are similar, almost instant response on all queries;
  3.  And yes, this apply only on DDL statements and is constant problem, not a random delay;
  4.  Regarding the network communication blocking, there is no firewall or a network performance issue between hive node and metastore db. As I said at the previous point, I run all queries also manually using mysql cmd client from the hive node and the response was almost instant;



Thank you,

Iulian



From: Mich Talebzadeh <mi...@gmail.com>>
Sent: Thursday, May 16, 2019 11:20 AM
To: user <us...@hive.apache.org>>
Subject: Re: Any HIVE DDL statement takes minutes to execute




Dr Mich Talebzadeh



LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw<https://kor01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fprofile%2Fview%3Fid%3DAAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw&data=02%7C01%7Cpfeng%40coupang.com%7C2206927930314b82742608d6da5b60b4%7Ce3098f96361b47c6a9f4ab7bafcaffe9%7C0%7C0%7C636936483336458998&sdata=1YdUBCcyHDkJCSYsszG012O1coPtA4GMXIzah1LxpvM%3D&reserved=0>



http://talebzadehmich.wordpress.com<https://kor01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftalebzadehmich.wordpress.com&data=02%7C01%7Cpfeng%40coupang.com%7C2206927930314b82742608d6da5b60b4%7Ce3098f96361b47c6a9f4ab7bafcaffe9%7C0%7C0%7C636936483336458998&sdata=M%2BopViIO8TSkYqj3LPjLDj4fpj4bnsOyCKCQvSVb%2FMc%3D&reserved=0>



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.



I don't know much about MySQL but assuming it has tools to see the activity in the back end, what locks are you seeing in the database itself plus the duration of time that the command is executed on RDBMS etc.



Does this only apply to the DDL statements?



It is either some locking/blocking in the back end or the network connection between your Hadoop and the RDBMS causing the issue



I just tested DDL for external table in Hive through Oracle database and there was no issue.



HTH





On Thu, 16 May 2019 at 08:16, Iulian Mongescu <iu...@grx.ro>> wrote:

Hi Alan,



I’m using MySQL (Mariadb) for the metastore and I was thinking on this possibility too but from all my tests on metastore database that I run, every query is almost instant.

For example :

SELECT * FROM `TBLS`  ->  Query took 0.0001 seconds.

INSERT INTO `TBLS` ->  Query took 0.0020 seconds

DELETE FROM `TBLS` -> Query took 0.0021 seconds



Thank you,

Iulian



From: Alan Gates <al...@gmail.com>>
Sent: Wednesday, May 15, 2019 9:51 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: Any HIVE DDL statement takes minutes to execute



What are you using as the RDBMS for your metastore?  A first place I'd look is if the communications with the RDBMS are slow for some reason.



Alan.



On Wed, May 15, 2019 at 10:34 AM Iulian Mongescu <iu...@grx.ro>> wrote:

Hello,



I'm working on a HDP-2.6.5.0 cluster with kerberos enabled and I have a problem with hive as any DDL statement that I run takes minutes to execute but any DML run in normal limits. I checked the logs but I didn’t find anything that seems related with this problem and I would appreciate any help to debug this issue.



Please find bellow some examples with DDL&DML queries and their durations:



------------------------------------

0: jdbc:hive2://hdpx03:10000/> CREATE EXTERNAL TABLE IF NOT EXISTS agenti1 (...) STORED AS ORC LOCATION '/staging/core/agenti/2019-03-18/29d52a54eecae3731b31a3d6ef45d012';

No rows affected (184.191 seconds)

------------------------------------

0: jdbc:hive2://hdpx03:10000/> show tables;

+-----------+--+

| tab_name |

+-----------+--+

| agenti1 |

+-----------+--+

1 row selected (0.358 seconds)

---------------------------------

0: jdbc:hive2://hdpx03:10000/> select count(*) as total from agenti1 where 1;

INFO : Tez session hasn't been created yet. Opening session

INFO : Dag name: select count(*) as total from agenti1 wh...1(Stage-1)

INFO : Status: Running (Executing on YARN cluster with App id application_1552674174918_0002)



--------------------------------------------------------------------------------

VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED

--------------------------------------------------------------------------------

Map 1 .......... SUCCEEDED 1 1 0 0 0 0

Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0

--------------------------------------------------------------------------------

VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 5.48 s

--------------------------------------------------------------------------------

+--------+--+

| total |

+--------+--+

| 1960 |

+--------+--+

1 row selected (15.853 seconds)



-------------------------------------------------------

0: jdbc:hive2://hdpx03:10000/> drop table agenti1;

No rows affected (184.164 seconds)

--------------------------------------------------------

0: jdbc:hive2://hdpx03:10000/> CREATE EXTERNAL TABLE IF NOT EXISTS agenti1 (...) STORED AS ORC LOCATION '/staging/core/agenti/2019-03-18/29d52a54eecae3731b31a3d6ef45d012';

No rows affected (190.288 seconds)



Thanks,



Iulian



Re: Any HIVE DDL statement takes minutes to execute

Posted by Alan Gates <al...@gmail.com>.
I can think of two things that could take a long time in creating a table,
database operations or file system operations.  The perf timers inside the
metastore only measure the entire metadata operation, not the file part and
the db part, so it will be hard to tell where the time is being spent.
When a table is first created the metastore prints a debug message to the
logs that says "create_table" (you have to having logging set to DEBUG to
see this).  This will tell you when the metastore started processing the
create table.  Between creating the directory for the table and connecting
to the RDBMS to create the entry for it, the createtime for the table is
set.  A describe table extended should show you the create time of the
table (or you can directly query the TBLS table in the RDBMS to find it as
well).  Finally, when the metastore is done creating the table there is
another entry in the log that starts with "create_table".  All three of
these timestamps are generated on the same machine, so clock syncing won't
be an issue.  These three timestamps should give you an idea of whether the
majority of the time is being spent creating the directory or creating an
entry in the database.

Which logs you need to look in to find the debug statements depends on
whether you have a separate Hive Metastore Thrift service running or you
have Hive Server2 directly communicating with the RDBMS.

Alan.

On Thu, May 16, 2019 at 1:42 AM Iulian Mongescu <iu...@grx.ro>
wrote:

> Hi Mich,
>
>
>
> First thank  you for taking the time to look over this problem. Now
> regarding the questions :
>
>
>
>    1. I can confirm there are no locks on metastore DB ;
>    2. About duration of the queries, in my previous mail I just gave some
>    examples and I  can confirm that I run those queries on the metastore db
>    server and also from the hive node that I’m using to test and the results
>    are similar, almost instant response on all queries;
>    3. And yes, this apply only on DDL statements and is constant problem,
>    not a random delay;
>    4. Regarding the network communication blocking, there is no firewall
>    or a network performance issue between hive node and metastore db. As I
>    said at the previous point, I run all queries also manually using mysql cmd
>    client from the hive node and the response was almost instant;
>
>
>
> Thank you,
>
> Iulian
>
>
>
> *From:* Mich Talebzadeh <mi...@gmail.com>
> *Sent:* Thursday, May 16, 2019 11:20 AM
> *To:* user <us...@hive.apache.org>
> *Subject:* Re: Any HIVE DDL statement takes minutes to execute
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn  *https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> I don't know much about MySQL but assuming it has tools to see the
> activity in the back end, what locks are you seeing in the database itself
> plus the duration of time that the command is executed on RDBMS etc.
>
>
>
> Does this only apply to the DDL statements?
>
>
>
> It is either some locking/blocking in the back end or the network
> connection between your Hadoop and the RDBMS causing the issue
>
>
>
> I just tested DDL for external table in Hive through Oracle database and
> there was no issue.
>
>
>
> HTH
>
>
>
>
>
> On Thu, 16 May 2019 at 08:16, Iulian Mongescu <iu...@grx.ro>
> wrote:
>
> Hi Alan,
>
>
>
> I’m using MySQL (Mariadb) for the metastore and I was thinking on this
> possibility too but from all my tests on metastore database that I run,
> every query is almost instant.
>
> For example :
>
> SELECT * FROM `TBLS`  ->  Query took 0.0001 seconds.
>
> INSERT INTO `TBLS` ->  Query took 0.0020 seconds
>
> DELETE FROM `TBLS` -> Query took 0.0021 seconds
>
>
>
> Thank you,
>
> Iulian
>
>
>
> *From:* Alan Gates <al...@gmail.com>
> *Sent:* Wednesday, May 15, 2019 9:51 PM
> *To:* user@hive.apache.org
> *Subject:* Re: Any HIVE DDL statement takes minutes to execute
>
>
>
> What are you using as the RDBMS for your metastore?  A first place I'd
> look is if the communications with the RDBMS are slow for some reason.
>
>
>
> Alan.
>
>
>
> On Wed, May 15, 2019 at 10:34 AM Iulian Mongescu <iu...@grx.ro>
> wrote:
>
> Hello,
>
>
>
> I'm working on a HDP-2.6.5.0 cluster with kerberos enabled and I have a
> problem with hive as any DDL statement that I run takes minutes to execute
> but any DML run in normal limits. I checked the logs but I didn’t find
> anything that seems related with this problem and I would appreciate any
> help to debug this issue.
>
>
>
> Please find bellow some examples with DDL&DML queries and their durations:
>
>
>
> ------------------------------------
>
> 0: jdbc:hive2://hdpx03:10000/> CREATE EXTERNAL TABLE IF NOT EXISTS agenti1
> (...) STORED AS ORC LOCATION
> '/staging/core/agenti/2019-03-18/29d52a54eecae3731b31a3d6ef45d012';
>
> No rows affected (184.191 seconds)
>
> ------------------------------------
>
> 0: jdbc:hive2://hdpx03:10000/> show tables;
>
> +-----------+--+
>
> | tab_name |
>
> +-----------+--+
>
> | agenti1 |
>
> +-----------+--+
>
> 1 row selected (0.358 seconds)
>
> ---------------------------------
>
> 0: jdbc:hive2://hdpx03:10000/> select count(*) as total from agenti1 where
> 1;
>
> INFO : Tez session hasn't been created yet. Opening session
>
> INFO : Dag name: select count(*) as total from agenti1 wh...1(Stage-1)
>
> INFO : Status: Running (Executing on YARN cluster with App id
> application_1552674174918_0002)
>
>
>
>
> --------------------------------------------------------------------------------
>
> VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
>
>
> --------------------------------------------------------------------------------
>
> Map 1 .......... SUCCEEDED 1 1 0 0 0 0
>
> Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
>
>
> --------------------------------------------------------------------------------
>
> VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 5.48 s
>
>
> --------------------------------------------------------------------------------
>
> +--------+--+
>
> | total |
>
> +--------+--+
>
> | 1960 |
>
> +--------+--+
>
> 1 row selected (15.853 seconds)
>
>
>
> -------------------------------------------------------
>
> 0: jdbc:hive2://hdpx03:10000/> drop table agenti1;
>
> No rows affected (184.164 seconds)
>
> --------------------------------------------------------
>
> 0: jdbc:hive2://hdpx03:10000/> CREATE EXTERNAL TABLE IF NOT EXISTS agenti1
> (...) STORED AS ORC LOCATION
> '/staging/core/agenti/2019-03-18/29d52a54eecae3731b31a3d6ef45d012';
>
> No rows affected (190.288 seconds)
>
>
>
> Thanks,
>
>
>
> Iulian
>
>
>
>

RE: Any HIVE DDL statement takes minutes to execute

Posted by Iulian Mongescu <iu...@grx.ro>.
Hi Mich,

First thank  you for taking the time to look over this problem. Now regarding the questions :


  1.  I can confirm there are no locks on metastore DB ;
  2.  About duration of the queries, in my previous mail I just gave some examples and I  can confirm that I run those queries on the metastore db server and also from the hive node that I’m using to test and the results are similar, almost instant response on all queries;
  3.  And yes, this apply only on DDL statements and is constant problem, not a random delay;
  4.  Regarding the network communication blocking, there is no firewall or a network performance issue between hive node and metastore db. As I said at the previous point, I run all queries also manually using mysql cmd client from the hive node and the response was almost instant;

Thank you,
Iulian

From: Mich Talebzadeh <mi...@gmail.com>
Sent: Thursday, May 16, 2019 11:20 AM
To: user <us...@hive.apache.org>
Subject: Re: Any HIVE DDL statement takes minutes to execute



Dr Mich Talebzadeh



LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.


I don't know much about MySQL but assuming it has tools to see the activity in the back end, what locks are you seeing in the database itself plus the duration of time that the command is executed on RDBMS etc.

Does this only apply to the DDL statements?

It is either some locking/blocking in the back end or the network connection between your Hadoop and the RDBMS causing the issue

I just tested DDL for external table in Hive through Oracle database and there was no issue.

HTH


On Thu, 16 May 2019 at 08:16, Iulian Mongescu <iu...@grx.ro>> wrote:
Hi Alan,

I’m using MySQL (Mariadb) for the metastore and I was thinking on this possibility too but from all my tests on metastore database that I run, every query is almost instant.
For example :
SELECT * FROM `TBLS`  ->  Query took 0.0001 seconds.
INSERT INTO `TBLS` ->  Query took 0.0020 seconds
DELETE FROM `TBLS` -> Query took 0.0021 seconds

Thank you,
Iulian

From: Alan Gates <al...@gmail.com>>
Sent: Wednesday, May 15, 2019 9:51 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: Any HIVE DDL statement takes minutes to execute

What are you using as the RDBMS for your metastore?  A first place I'd look is if the communications with the RDBMS are slow for some reason.

Alan.

On Wed, May 15, 2019 at 10:34 AM Iulian Mongescu <iu...@grx.ro>> wrote:
Hello,

I'm working on a HDP-2.6.5.0 cluster with kerberos enabled and I have a problem with hive as any DDL statement that I run takes minutes to execute but any DML run in normal limits. I checked the logs but I didn’t find anything that seems related with this problem and I would appreciate any help to debug this issue.

Please find bellow some examples with DDL&DML queries and their durations:


------------------------------------

0: jdbc:hive2://hdpx03:10000/> CREATE EXTERNAL TABLE IF NOT EXISTS agenti1 (...) STORED AS ORC LOCATION '/staging/core/agenti/2019-03-18/29d52a54eecae3731b31a3d6ef45d012';

No rows affected (184.191 seconds)

------------------------------------

0: jdbc:hive2://hdpx03:10000/> show tables;

+-----------+--+

| tab_name |

+-----------+--+

| agenti1 |

+-----------+--+

1 row selected (0.358 seconds)

---------------------------------

0: jdbc:hive2://hdpx03:10000/> select count(*) as total from agenti1 where 1;

INFO : Tez session hasn't been created yet. Opening session

INFO : Dag name: select count(*) as total from agenti1 wh...1(Stage-1)

INFO : Status: Running (Executing on YARN cluster with App id application_1552674174918_0002)



--------------------------------------------------------------------------------

VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED

--------------------------------------------------------------------------------

Map 1 .......... SUCCEEDED 1 1 0 0 0 0

Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0

--------------------------------------------------------------------------------

VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 5.48 s

--------------------------------------------------------------------------------

+--------+--+

| total |

+--------+--+

| 1960 |

+--------+--+

1 row selected (15.853 seconds)



-------------------------------------------------------

0: jdbc:hive2://hdpx03:10000/> drop table agenti1;

No rows affected (184.164 seconds)

--------------------------------------------------------

0: jdbc:hive2://hdpx03:10000/> CREATE EXTERNAL TABLE IF NOT EXISTS agenti1 (...) STORED AS ORC LOCATION '/staging/core/agenti/2019-03-18/29d52a54eecae3731b31a3d6ef45d012';

No rows affected (190.288 seconds)

Thanks,

Iulian


Re: Any HIVE DDL statement takes minutes to execute

Posted by Mich Talebzadeh <mi...@gmail.com>.
Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.


I don't know much about MySQL but assuming it has tools to see the activity
in the back end, what locks are you seeing in the database itself plus the
duration of time that the command is executed on RDBMS etc.

Does this only apply to the DDL statements?

It is either some locking/blocking in the back end or the network
connection between your Hadoop and the RDBMS causing the issue

I just tested DDL for external table in Hive through Oracle database and
there was no issue.

HTH


On Thu, 16 May 2019 at 08:16, Iulian Mongescu <iu...@grx.ro>
wrote:

> Hi Alan,
>
>
>
> I’m using MySQL (Mariadb) for the metastore and I was thinking on this
> possibility too but from all my tests on metastore database that I run,
> every query is almost instant.
>
> For example :
>
> SELECT * FROM `TBLS`  ->  Query took 0.0001 seconds.
>
> INSERT INTO `TBLS` ->  Query took 0.0020 seconds
>
> DELETE FROM `TBLS` -> Query took 0.0021 seconds
>
>
>
> Thank you,
>
> Iulian
>
>
>
> *From:* Alan Gates <al...@gmail.com>
> *Sent:* Wednesday, May 15, 2019 9:51 PM
> *To:* user@hive.apache.org
> *Subject:* Re: Any HIVE DDL statement takes minutes to execute
>
>
>
> What are you using as the RDBMS for your metastore?  A first place I'd
> look is if the communications with the RDBMS are slow for some reason.
>
>
>
> Alan.
>
>
>
> On Wed, May 15, 2019 at 10:34 AM Iulian Mongescu <iu...@grx.ro>
> wrote:
>
> Hello,
>
>
>
> I'm working on a HDP-2.6.5.0 cluster with kerberos enabled and I have a
> problem with hive as any DDL statement that I run takes minutes to execute
> but any DML run in normal limits. I checked the logs but I didn’t find
> anything that seems related with this problem and I would appreciate any
> help to debug this issue.
>
>
>
> Please find bellow some examples with DDL&DML queries and their durations:
>
>
>
> ------------------------------------
>
> 0: jdbc:hive2://hdpx03:10000/> CREATE EXTERNAL TABLE IF NOT EXISTS agenti1
> (...) STORED AS ORC LOCATION
> '/staging/core/agenti/2019-03-18/29d52a54eecae3731b31a3d6ef45d012';
>
> No rows affected (184.191 seconds)
>
> ------------------------------------
>
> 0: jdbc:hive2://hdpx03:10000/> show tables;
>
> +-----------+--+
>
> | tab_name |
>
> +-----------+--+
>
> | agenti1 |
>
> +-----------+--+
>
> 1 row selected (0.358 seconds)
>
> ---------------------------------
>
> 0: jdbc:hive2://hdpx03:10000/> select count(*) as total from agenti1 where
> 1;
>
> INFO : Tez session hasn't been created yet. Opening session
>
> INFO : Dag name: select count(*) as total from agenti1 wh...1(Stage-1)
>
> INFO : Status: Running (Executing on YARN cluster with App id
> application_1552674174918_0002)
>
>
>
>
> --------------------------------------------------------------------------------
>
> VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
>
>
> --------------------------------------------------------------------------------
>
> Map 1 .......... SUCCEEDED 1 1 0 0 0 0
>
> Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
>
>
> --------------------------------------------------------------------------------
>
> VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 5.48 s
>
>
> --------------------------------------------------------------------------------
>
> +--------+--+
>
> | total |
>
> +--------+--+
>
> | 1960 |
>
> +--------+--+
>
> 1 row selected (15.853 seconds)
>
>
>
> -------------------------------------------------------
>
> 0: jdbc:hive2://hdpx03:10000/> drop table agenti1;
>
> No rows affected (184.164 seconds)
>
> --------------------------------------------------------
>
> 0: jdbc:hive2://hdpx03:10000/> CREATE EXTERNAL TABLE IF NOT EXISTS agenti1
> (...) STORED AS ORC LOCATION
> '/staging/core/agenti/2019-03-18/29d52a54eecae3731b31a3d6ef45d012';
>
> No rows affected (190.288 seconds)
>
>
>
> Thanks,
>
>
>
> Iulian
>
>
>
>

RE: Any HIVE DDL statement takes minutes to execute

Posted by Iulian Mongescu <iu...@grx.ro>.
Hi Alan,

I’m using MySQL (Mariadb) for the metastore and I was thinking on this possibility too but from all my tests on metastore database that I run, every query is almost instant.
For example :
SELECT * FROM `TBLS`  ->  Query took 0.0001 seconds.
INSERT INTO `TBLS` ->  Query took 0.0020 seconds
DELETE FROM `TBLS` -> Query took 0.0021 seconds

Thank you,
Iulian

From: Alan Gates <al...@gmail.com>
Sent: Wednesday, May 15, 2019 9:51 PM
To: user@hive.apache.org
Subject: Re: Any HIVE DDL statement takes minutes to execute

What are you using as the RDBMS for your metastore?  A first place I'd look is if the communications with the RDBMS are slow for some reason.

Alan.

On Wed, May 15, 2019 at 10:34 AM Iulian Mongescu <iu...@grx.ro>> wrote:
Hello,

I'm working on a HDP-2.6.5.0 cluster with kerberos enabled and I have a problem with hive as any DDL statement that I run takes minutes to execute but any DML run in normal limits. I checked the logs but I didn’t find anything that seems related with this problem and I would appreciate any help to debug this issue.

Please find bellow some examples with DDL&DML queries and their durations:


------------------------------------

0: jdbc:hive2://hdpx03:10000/> CREATE EXTERNAL TABLE IF NOT EXISTS agenti1 (...) STORED AS ORC LOCATION '/staging/core/agenti/2019-03-18/29d52a54eecae3731b31a3d6ef45d012';

No rows affected (184.191 seconds)

------------------------------------

0: jdbc:hive2://hdpx03:10000/> show tables;

+-----------+--+

| tab_name |

+-----------+--+

| agenti1 |

+-----------+--+

1 row selected (0.358 seconds)

---------------------------------

0: jdbc:hive2://hdpx03:10000/> select count(*) as total from agenti1 where 1;

INFO : Tez session hasn't been created yet. Opening session

INFO : Dag name: select count(*) as total from agenti1 wh...1(Stage-1)

INFO : Status: Running (Executing on YARN cluster with App id application_1552674174918_0002)



--------------------------------------------------------------------------------

VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED

--------------------------------------------------------------------------------

Map 1 .......... SUCCEEDED 1 1 0 0 0 0

Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0

--------------------------------------------------------------------------------

VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 5.48 s

--------------------------------------------------------------------------------

+--------+--+

| total |

+--------+--+

| 1960 |

+--------+--+

1 row selected (15.853 seconds)



-------------------------------------------------------

0: jdbc:hive2://hdpx03:10000/> drop table agenti1;

No rows affected (184.164 seconds)

--------------------------------------------------------

0: jdbc:hive2://hdpx03:10000/> CREATE EXTERNAL TABLE IF NOT EXISTS agenti1 (...) STORED AS ORC LOCATION '/staging/core/agenti/2019-03-18/29d52a54eecae3731b31a3d6ef45d012';

No rows affected (190.288 seconds)

Thanks,

Iulian


Re: Any HIVE DDL statement takes minutes to execute

Posted by Alan Gates <al...@gmail.com>.
What are you using as the RDBMS for your metastore?  A first place I'd look
is if the communications with the RDBMS are slow for some reason.

Alan.

On Wed, May 15, 2019 at 10:34 AM Iulian Mongescu <iu...@grx.ro>
wrote:

> Hello,
>
>
>
> I'm working on a HDP-2.6.5.0 cluster with kerberos enabled and I have a
> problem with hive as any DDL statement that I run takes minutes to execute
> but any DML run in normal limits. I checked the logs but I didn’t find
> anything that seems related with this problem and I would appreciate any
> help to debug this issue.
>
>
>
> Please find bellow some examples with DDL&DML queries and their durations:
>
>
>
> ------------------------------------
>
> 0: jdbc:hive2://hdpx03:10000/> CREATE EXTERNAL TABLE IF NOT EXISTS agenti1
> (...) STORED AS ORC LOCATION
> '/staging/core/agenti/2019-03-18/29d52a54eecae3731b31a3d6ef45d012';
>
> No rows affected (184.191 seconds)
>
> ------------------------------------
>
> 0: jdbc:hive2://hdpx03:10000/> show tables;
>
> +-----------+--+
>
> | tab_name |
>
> +-----------+--+
>
> | agenti1 |
>
> +-----------+--+
>
> 1 row selected (0.358 seconds)
>
> ---------------------------------
>
> 0: jdbc:hive2://hdpx03:10000/> select count(*) as total from agenti1 where
> 1;
>
> INFO : Tez session hasn't been created yet. Opening session
>
> INFO : Dag name: select count(*) as total from agenti1 wh...1(Stage-1)
>
> INFO : Status: Running (Executing on YARN cluster with App id
> application_1552674174918_0002)
>
>
>
>
> --------------------------------------------------------------------------------
>
> VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
>
>
> --------------------------------------------------------------------------------
>
> Map 1 .......... SUCCEEDED 1 1 0 0 0 0
>
> Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
>
>
> --------------------------------------------------------------------------------
>
> VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 5.48 s
>
>
> --------------------------------------------------------------------------------
>
> +--------+--+
>
> | total |
>
> +--------+--+
>
> | 1960 |
>
> +--------+--+
>
> 1 row selected (15.853 seconds)
>
>
>
> -------------------------------------------------------
>
> 0: jdbc:hive2://hdpx03:10000/> drop table agenti1;
>
> No rows affected (184.164 seconds)
>
> --------------------------------------------------------
>
> 0: jdbc:hive2://hdpx03:10000/> CREATE EXTERNAL TABLE IF NOT EXISTS agenti1
> (...) STORED AS ORC LOCATION
> '/staging/core/agenti/2019-03-18/29d52a54eecae3731b31a3d6ef45d012';
>
> No rows affected (190.288 seconds)
>
>
>
> Thanks,
>
>
>
> Iulian
>
>
>