You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Peikai Zheng (JIRA)" <ji...@apache.org> on 2018/09/27 22:28:00 UTC

[jira] [Updated] (IMPALA-7627) Parallel the fetching permission process

     [ https://issues.apache.org/jira/browse/IMPALA-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peikai Zheng updated IMPALA-7627:
---------------------------------
    Description: 
There are three phases when the Catalogd loading the metadata of a table.
 Firstly, the Catalogd fetches the metadata from Hive metastore;
 Then, the Catalogd fetches the permission of each partition from HDFS NameNode;
 Finally, the Catalogd loads the file descriptor from HDFS NameNode.

According to my test result(Based on commit *11554a17c75b242767d5a50d66bc2874aa545c77*):
||Average Time(GetFileInfoThread=10)||phase 1||phase 2||phase 3||
|idm.sauron_message|9.9917115|459.2106944|95.0179163|
|default.revenue_enriched|12.3377474|111.2969046|40.827472|
|default.upp_raw_prod|1.5143162|50.0251426|12.6805323|
|default.hit_to_beacon_playback_prod|1.4294509|49.7670539|18.3557858|
|default.sitetracking_enriched|13.0003804|112.8746656|42.1824032|
|default.player_custom_event|9.2618705|493.4865302|116.4986184|
|default.revenue_day_est|57.9116561|106.5028664|24.005822|

The majority of the time occupied by the second phase.

So, I suggest to parallel the second phase.

  was:
There are three phases when the Catalogd loading the metadata of a table.
Firstly, the Catalogd fetches the metadata from Hive metastore;
Then, the Catalogd fetches the permission of each partition from HDFS NameNode;
Finally, the Catalogd loads the file descriptor from HDFS NameNode.

According to my test result:

||Average Time(GetFileInfoThread=10) || phase 1 || phase 2 || phase 3||			
|idm.sauron_message|9.9917115|459.2106944|95.0179163|
|default.revenue_enriched|12.3377474|111.2969046|40.827472|
|default.upp_raw_prod|1.5143162|50.0251426|12.6805323|
|default.hit_to_beacon_playback_prod|1.4294509|49.7670539|18.3557858|
|default.sitetracking_enriched|13.0003804|112.8746656|42.1824032|
|default.player_custom_event|9.2618705|493.4865302|116.4986184|
|default.revenue_day_est|57.9116561|106.5028664|24.005822|

The majority of the time occupied by the second phase. 

So, I suggest to parallel the second phase.


> Parallel the fetching permission process
> ----------------------------------------
>
>                 Key: IMPALA-7627
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7627
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Peikai Zheng
>            Assignee: Peikai Zheng
>            Priority: Major
>
> There are three phases when the Catalogd loading the metadata of a table.
>  Firstly, the Catalogd fetches the metadata from Hive metastore;
>  Then, the Catalogd fetches the permission of each partition from HDFS NameNode;
>  Finally, the Catalogd loads the file descriptor from HDFS NameNode.
> According to my test result(Based on commit *11554a17c75b242767d5a50d66bc2874aa545c77*):
> ||Average Time(GetFileInfoThread=10)||phase 1||phase 2||phase 3||
> |idm.sauron_message|9.9917115|459.2106944|95.0179163|
> |default.revenue_enriched|12.3377474|111.2969046|40.827472|
> |default.upp_raw_prod|1.5143162|50.0251426|12.6805323|
> |default.hit_to_beacon_playback_prod|1.4294509|49.7670539|18.3557858|
> |default.sitetracking_enriched|13.0003804|112.8746656|42.1824032|
> |default.player_custom_event|9.2618705|493.4865302|116.4986184|
> |default.revenue_day_est|57.9116561|106.5028664|24.005822|
> The majority of the time occupied by the second phase.
> So, I suggest to parallel the second phase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org