You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Amit Saonerkar (Jira)" <ji...@apache.org> on 2022/11/01 12:07:00 UTC

[jira] [Commented] (HIVE-26437) dump unpartitioned managed table metadata in parallel

    [ https://issues.apache.org/jira/browse/HIVE-26437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627126#comment-17627126 ] 

Amit Saonerkar commented on HIVE-26437:
---------------------------------------

*+Jmh Performance Benchmark Test+* (itests/hive-jmh/src/main/java/org/apache/hive/benchmark/ql/exec/TableAndPartitionExportBench.java)

The test runs in two modes, one is parallel and other is serial. The parallel mode makes use of ExportService and creates 100 threads to execute a total of 500 tasks. The serial mode does not use ExportService and runs in single threaded mode to execute 500 tasks. The comparison is made between both runs. The total 5 iterations of each mode is run by the test and time of run is measured in milliseconds. The average time required for 5 iterations is then output as a benchmark for the operations.





*Results:-*

​​Results of jmh performance benchmark test indicates an improvement in TableExport operation. If the table is dumped in parallel instead of serial , the operation completes much faster. Below is the result seen when 500 tableexport operations are done both in serial and parallel manner.

Result "org.apache.hive.benchmark.ql.exec.TableAndPartitionExportBench.BaseBench.parallel":

N = 5

mean = 640.862 ?(99.9%) 113.354 ms/op

Result "org.apache.hive.benchmark.ql.exec.TableAndPartitionExportBench.BaseBench.serial":

N = 5

mean = 51697.322 ?(99.9%) 322.747 ms/op

*Benchmark* {*}{*}{*}{*}{*}{*}{*}{*}{*}{*}  *Mode      Cnt Score    Error  Units*

*TableAndPartitionExportBench.BaseBench.parallel ss  5     640.862  ?  113.354 ms/op*

*TableAndPartitionExportBench.BaseBench.serial     ss 5 51697.322   ?  322.747 ms/op*

 

 

*+End-End Performance benchmark number+* 

 ** 

A database is created with 1k managed acid tables which are all unpartitioned tables.

The config parameter REPL_TABLE_DUMP_PARALLELISM value of 100 is set before the replication dump command is executed. It is found that replication dump takes 9 sec to complete table metadata dump with new Export service. 

When export service is not used it is found that the same number of tables took around 27 sec to complete the entire dump process. Hence it is seen that there is 3x improvement in performance of replication dump command execution.

 
|*No. of Tables*|*REPL_TABLE_DUMP_PARALLELISM*|*Export Service used*|*Time taken for REPL DUMP*|
|1000|100|Yes|9 sec|
|1000|100|No|27 sec|

 

> dump unpartitioned managed table metadata in parallel
> -----------------------------------------------------
>
>                 Key: HIVE-26437
>                 URL: https://issues.apache.org/jira/browse/HIVE-26437
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive
>            Reporter: Amit Saonerkar
>            Assignee: Amit Saonerkar
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)