You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Peikai Zheng (JIRA)" <ji...@apache.org> on 2018/09/28 08:55:00 UTC

[jira] [Comment Edited] (IMPALA-7627) Parallel the fetching permission process

    [ https://issues.apache.org/jira/browse/IMPALA-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16631140#comment-16631140 ] 

Peikai Zheng edited comment on IMPALA-7627 at 9/28/18 8:54 AM:
---------------------------------------------------------------

[~bharathv] The test is based on commit 11554a17c75b242767d5a50d66bc2874aa545c77, so the improvements in IMPALA-7320 was not involved.

My approach is making the fetching permission process parallel using thread pool, which means making the function [createPartition|https://github.com/apache/impala/blob/branch-3.0.0/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L810] parallel.

I will test the time cost based on the newest version.


was (Author: pkwv):
[~bharathv] The test is based on commit **11554a17c75b242767d5a50d66bc2874aa545c77, so the improvements in IMPALA-7320 was not involved.

My approach is making the fetching permission process parallel using thread pool, which means making the function [createPartition|https://github.com/apache/impala/blob/branch-3.0.0/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L810] parallel.

I will test the time cost based on the newest version.

> Parallel the fetching permission process
> ----------------------------------------
>
>                 Key: IMPALA-7627
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7627
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Peikai Zheng
>            Assignee: Peikai Zheng
>            Priority: Major
>
> There are three phases when the Catalogd loading the metadata of a table.
>  Firstly, the Catalogd fetches the metadata from Hive metastore;
>  Then, the Catalogd fetches the permission of each partition from HDFS NameNode;
>  Finally, the Catalogd loads the file descriptor from HDFS NameNode.
> According to my test result(Based on commit *11554a17c75b242767d5a50d66bc2874aa545c77*):
> ||Average Time(GetFileInfoThread=10)||phase 1||phase 2||phase 3||
> |idm.sauron_message|9.9917115|459.2106944|95.0179163|
> |default.revenue_enriched|12.3377474|111.2969046|40.827472|
> |default.upp_raw_prod|1.5143162|50.0251426|12.6805323|
> |default.hit_to_beacon_playback_prod|1.4294509|49.7670539|18.3557858|
> |default.sitetracking_enriched|13.0003804|112.8746656|42.1824032|
> |default.player_custom_event|9.2618705|493.4865302|116.4986184|
> |default.revenue_day_est|57.9116561|106.5028664|24.005822|
>  Detailed Information of tables:
> ||Table||#Partitions||#Files||Size(without replica) / TB||Size(with replica) / TB||
> |idm.sauron_message|12923|69537|44.4|90.3|
> |default.revenue_enriched|1809|1832001|145.5|308.6|
> |default.upp_raw_prod|801|480000|186.3|424|
> |default.hit_to_beacon_playback_prod|777|793900|46.6|139.9|
> |default.sitetracking_enriched|1809|1842049|21.7|65|
> |default.player_custom_event|8816|2197096|47.2|141.5|
> |default.revenue_day_est|1731|109815|25.9|77.6|
> So, I suggest to parallel the second phase.The majority of the time occupied by the second phase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org