You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Rajesh Balamohan (Jira)" <ji...@apache.org> on 2020/10/27 09:50:00 UTC

[jira] [Created] (HIVE-24313) Optimise stats collection for file sizes on cloud storage

Rajesh Balamohan created HIVE-24313:
---------------------------------------

             Summary: Optimise stats collection for file sizes on cloud storage
                 Key: HIVE-24313
                 URL: https://issues.apache.org/jira/browse/HIVE-24313
             Project: Hive
          Issue Type: Improvement
          Components: HiveServer2
            Reporter: Rajesh Balamohan


When stats information is not present (e.g external table), RelOptHiveTable computes basic stats at runtime.

Following is the codepath.

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java#L598]
{code:java}
Statistics stats = StatsUtils.collectStatistics(hiveConf, partitionList,
                hiveTblMetadata, hiveNonPartitionCols, nonPartColNamesThatRqrStats, colStatsCached,
                nonPartColNamesThatRqrStats, true);
 {code}
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L322]
{code:java}
for (Partition p : partList.getNotDeniedPartns()) {
        BasicStats basicStats = basicStatsFactory.build(Partish.buildFor(table, p));
        partStats.add(basicStats);
      }
 {code}
[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStats.java#L205]

 
{code:java}
try {
            ds = getFileSizeForPath(path);
          } catch (IOException e) {
            ds = 0L;
          }
 {code}
 

For a table & query with large number of partitions, this takes long time to compute statistics and increases compilation time.  It would be good to fix it with "ForkJoinPool" ( partList.getNotDeniedPartns().parallelStream().forEach((p) )

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)