You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/09/02 19:00:00 UTC
[jira] [Work logged] (HIVE-26496) FetchOperator scans delete_delta folders multiple times causing slowness
[ https://issues.apache.org/jira/browse/HIVE-26496?focusedWorklogId=805856&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-805856 ]
ASF GitHub Bot logged work on HIVE-26496:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 02/Sep/22 18:59
Start Date: 02/Sep/22 18:59
Worklog Time Spent: 10m
Work Description: sonarcloud[bot] commented on PR #3559:
URL: https://github.com/apache/hive/pull/3559#issuecomment-1235814585
Kudos, SonarCloud Quality Gate passed! [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=3559)
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3559&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3559&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3559&resolved=false&types=BUG)
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3559&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3559&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3559&resolved=false&types=VULNERABILITY)
[![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3559&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3559&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=3559&resolved=false&types=SECURITY_HOTSPOT)
[![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3559&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3559&resolved=false&types=CODE_SMELL) [0 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=3559&resolved=false&types=CODE_SMELL)
[![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3559&metric=coverage&view=list) No Coverage information
[![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=3559&metric=duplicated_lines_density&view=list) No Duplication information
Issue Time Tracking
-------------------
Worklog Id: (was: 805856)
Time Spent: 50m (was: 40m)
> FetchOperator scans delete_delta folders multiple times causing slowness
> ------------------------------------------------------------------------
>
> Key: HIVE-26496
> URL: https://issues.apache.org/jira/browse/HIVE-26496
> Project: Hive
> Issue Type: Bug
> Components: HiveServer2
> Reporter: Rajesh Balamohan
> Assignee: Dmitriy Fingerman
> Priority: Major
> Labels: pull-request-available
> Time Spent: 50m
> Remaining Estimate: 0h
>
> FetchOperator scans way too many number of files/directories than needed.
> For e.g here is a layout of a table which had set of updates and deletes. There are set of "delta" and "delete_delta" folders which are created.
> {noformat}
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/base_0000001
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_0000002_0000002_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_0000003_0000003_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_0000004_0000004_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_0000005_0000005_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_0000006_0000006_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_0000007_0000007_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_0000008_0000008_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_0000009_0000009_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_0000010_0000010_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_0000011_0000011_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_0000012_0000012_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_0000013_0000013_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_0000014_0000014_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_0000015_0000015_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_0000016_0000016_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_0000017_0000017_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_0000018_0000018_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_0000019_0000019_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_0000020_0000020_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_0000021_0000021_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delete_delta_0000022_0000022_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_0000002_0000002_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_0000003_0000003_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_0000004_0000004_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_0000005_0000005_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_0000006_0000006_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_0000007_0000007_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_0000008_0000008_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_0000009_0000009_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_0000010_0000010_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_0000011_0000011_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_0000012_0000012_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_0000013_0000013_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_0000014_0000014_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_0000015_0000015_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_0000016_0000016_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_0000017_0000017_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_0000018_0000018_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_0000019_0000019_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_0000020_0000020_0000
> s3a://bucket-name/warehouse/tablespace/managed/hive/test.db/date_dim/delta_0000021_0000021_0000
> {noformat}
>
> When user runs *{color:#0747a6}{{select * from date_dim}}{color}* from beeline, FetchOperator tries to compute splits in "date_dim". This "base" and "delta" folders and computes 21 splits.
> However, for each of the 21 splits, it ends up loading entire "delete_delta" folders and scans unnecessarily. This increases the scan by "21 splits * 21 delete_delta folders" (i.e 1396) times. This makes the statement execution super slow, even when there is minimal dataset present in the table.
> It will be good to scan only relevant delete_delta folder in the split, instead of loading all delete_delta folders in every split.
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java#L1142|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java#L1142]
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java#L1172|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java#L1172]
>
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java#L402|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java#L402]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)