You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "t oo (JIRA)" <ji...@apache.org> on 2019/03/29 17:39:03 UTC

[jira] [Created] (HIVE-21546) hiveserver2 - “mapred.FileInputFormat: Total input files to process” - why single threaded?

t oo created HIVE-21546:
---------------------------

             Summary: hiveserver2 - “mapred.FileInputFormat: Total input files to process” - why single threaded?
                 Key: HIVE-21546
                 URL: https://issues.apache.org/jira/browse/HIVE-21546
             Project: Hive
          Issue Type: Bug
            Reporter: t oo


I have setup Hive (v2.3.4) on Spark (exec engine, but MR gets same issue). My external hive table is Parquet format on s3 across 100s of partitions. Below settings are set to 20:

{{hive.exec.input.listing.max.threads mapred.dfsclient.parallelism.max mapreduce.input.fileinputformat.list-status.num-threads }}

Run a simple query:

{{select * from s.there h_code = 'KGD78' and h_no = '265' }}

I can see the below in HiveServer2 logs (the logs continue for more than 1000 lines listing all the different partitions). Why is the listing of files not being done in parallel? It takes more than 5mins just in the listing.

{{2019-03-29T11:29:26,866 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] compress.CodecPool: Got brand-new decompressor [.snappy] 2019-03-29T11:29:27,283 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:27,797 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:28,374 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:28,919 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:29,483 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:30,003 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:30,518 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:31,001 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:31,549 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:32,048 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:32,574 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:33,130 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:33,639 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:34,189 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:34,743 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:35,208 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:35,701 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:36,183 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:36,662 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:37,154 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:37,645 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 }}

I've tried

{{hive.exec.input.listing.max.threads mapred.dfsclient.parallelism.max mapreduce.input.fileinputformat.list-status.num-threads }}

with defaults, 1, 50...still same result



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)