You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Attila Magyar (Jira)" <ji...@apache.org> on 2019/10/28 08:37:00 UTC
[jira] [Created] (HIVE-22411) Performance degradation on single row
inserts
Attila Magyar created HIVE-22411:
------------------------------------
Summary: Performance degradation on single row inserts
Key: HIVE-22411
URL: https://issues.apache.org/jira/browse/HIVE-22411
Project: Hive
Issue Type: Bug
Components: Hive
Reporter: Attila Magyar
Assignee: Attila Magyar
Fix For: 4.0.0
Attachments: Screen Shot 2019-10-17 at 8.40.50 PM.png
Executing single insert statements on a transactional table effects write performance on a s3 file system. Each insert creates a new delta directory. After each insert hive calculates statistics like number of file in the table and total size of the table. For this it traverses the directory recursively. During the recursion for each path a separateĀ listStatus call is executed. In the end the more delta directory you have the more time it takes to calculate the statistics.
Therefore insertion time goes up linearly:
!Screen Shot 2019-10-17 at 8.40.50 PM.png|width=601,height=436!
The fix is to useĀ fs.listFiles(path, /*recursive*/ true) instead the handcrafter recursive method/
--
This message was sent by Atlassian Jira
(v8.3.4#803005)