You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Alex Rodoni (JIRA)" <ji...@apache.org> on 2018/08/30 18:27:00 UTC
[jira] [Updated] (IMPALA-4172) Switch from using
getFileBlockLocations to BlockLocation methods (Potential 50% speedup in
metadata loading)
[ https://issues.apache.org/jira/browse/IMPALA-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alex Rodoni updated IMPALA-4172:
--------------------------------
Docs Text: (was: Improves the performance of block metadata fetching by the Catalog server from the Namenode by substantially reducing the number of RPCs.)
> Switch from using getFileBlockLocations to BlockLocation methods (Potential 50% speedup in metadata loading)
> ------------------------------------------------------------------------------------------------------------
>
> Key: IMPALA-4172
> URL: https://issues.apache.org/jira/browse/IMPALA-4172
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Affects Versions: Impala 2.8.0
> Reporter: Mostafa Mokhtar
> Assignee: bharath v
> Priority: Critical
> Labels: performance, ramp-up
> Fix For: Impala 2.8.0
>
> Attachments: query_after_invalidate_store_sales_800Kfiles_test.jfr
>
>
> HDFS-8895 removes the ability to query volume IDs from datanodes. This information has instead been added to BlockLocation, which is accessible via various FileSystem APIs (namely, anything that returns LocatedFileStatus).
> This new API is more efficient and more accurate. It's also available from CDH5.5 onwards, so can be backported as well.
> getFileBlockLocations is a bottle neck during metadata loading for Impala.
> {code}
> Stack Trace Sample Count Percentage(%)
> java.lang.Thread.run() 17,837 73.758
> java.util.concurrent.ThreadPoolExecutor$Worker.run() 17,837 73.758
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 17,837 73.758
> java.util.concurrent.FutureTask.run() 17,600 72.778
> com.cloudera.impala.catalog.TableLoadingMgr$2.call() 17,513 72.419
> com.cloudera.impala.catalog.TableLoadingMgr$2.call() 17,513 72.419
> com.cloudera.impala.catalog.TableLoader.load(Db, String) 17,513 72.419
> com.cloudera.impala.catalog.HdfsTable.load(boolean, IMetaStoreClient, Table) 17,513 72.419
> com.cloudera.impala.catalog.HdfsTable.load(boolean, IMetaStoreClient, Table, boolean, boolean, Set) 17,513 72.419
> com.cloudera.impala.catalog.HdfsTable.loadAllPartitions(List, Table) 15,721 65.008
> com.cloudera.impala.catalog.HdfsTable.createPartition(StorageDescriptor, Partition, Map) 13,611 56.283
> com.cloudera.impala.catalog.HdfsTable.updatePartitionFds(Path, boolean, HdfsFileFormat, Map) 7,942 32.841
> com.cloudera.impala.catalog.HdfsTable.loadBlockMetadata(FileSystem, FileStatus, HdfsPartition$FileDescriptor, HdfsFileFormat, Map) 4,319 17.86
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(FileStatus, long, long) 3,678 15.209
> com.cloudera.impala.catalog.HdfsPartition$BlockReplica.parseLocation(String) 203 0.839
> {code}
> Pointer to the JAVA docs for the new API
> [https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/fs/FileSystem.html#listFiles(org.apache.hadoop.fs.Path, boolean)]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org