You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "t oo (JIRA)" <ji...@apache.org> on 2019/03/29 18:14:00 UTC

[jira] [Updated] (HIVE-21546) hiveserver2 - “mapred.FileInputFormat: Total input files to process” - why single threaded?

     [ https://issues.apache.org/jira/browse/HIVE-21546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

t oo updated HIVE-21546:
------------------------
    Description: 
I have setup Hive (v2.3.4) on Spark (exec engine, but MR gets same issue), hadoop 2.7.6 (or hadoop 2.8.5). My external hive table is Parquet format on s3 across 100s of partitions. Below settings are set to 20:

{\{hive.exec.input.listing.max.threads mapred.dfsclient.parallelism.max mapreduce.input.fileinputformat.list-status.num-threads }}

Run a simple query:

{\{select * from s.there h_code = 'KGD78' and h_no = '265' }}

I can see the below in HiveServer2 logs (the logs continue for more than 1000 lines listing all the different partitions). Why is the listing of files not being done in parallel? It takes more than 5mins just in the listing.

{{2019-03-29T11:29:26,866 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] compress.CodecPool: Got brand-new decompressor [.snappy] 2019-03-29T11:29:27,283 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:27,797 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:28,374 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:28,919 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:29,483 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:30,003 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:30,518 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:31,001 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:31,549 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:32,048 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:32,574 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:33,130 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:33,639 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:34,189 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:34,743 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:35,208 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:35,701 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:36,183 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:36,662 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:37,154 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:37,645 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 }}

I've tried

{\{hive.exec.input.listing.max.threads mapred.dfsclient.parallelism.max mapreduce.input.fileinputformat.list-status.num-threads }}

with defaults, 1, 50...still same result

 

 

 

Hive 3.1.1/hadoop3.1.2 also has the issue:

 

2019-03-29T18:10:15,451 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: at row 0. reading next block
2019-03-29T18:10:15,461 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: block read in memory in 10 ms. row count = 4584
2019-03-29T18:10:15,620 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T18:10:15,714 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
2019-03-29T18:10:15,757 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
2019-03-29T18:10:15,767 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 4584 records.
2019-03-29T18:10:15,767 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: at row 0. reading next block
2019-03-29T18:10:15,777 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: block read in memory in 10 ms. row count = 4584
2019-03-29T18:10:15,984 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T18:10:16,033 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
2019-03-29T18:10:16,070 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
2019-03-29T18:10:16,080 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 4584 records.
2019-03-29T18:10:16,080 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: at row 0. reading next block
2019-03-29T18:10:16,089 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: block read in memory in 9 ms. row c ount = 4584
2019-03-29T18:10:16,287 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T18:10:16,356 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
2019-03-29T18:10:16,404 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
2019-03-29T18:10:16,415 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 4584 records.
2019-03-29T18:10:16,415 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: at row 0. reading next block
2019-03-29T18:10:16,426 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: block read in memory in 11 ms. row count = 4584
2019-03-29T18:10:16,613 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T18:10:16,654 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
2019-03-29T18:10:16,700 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
2019-03-29T18:10:16,712 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 240 records.
2019-03-29T18:10:16,712 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: at row 0. reading next block
2019-03-29T18:10:16,722 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: block read in memory in 10 ms. row count = 240
2019-03-29T18:10:16,895 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T18:10:16,934 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
2019-03-29T18:10:17,004 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
2019-03-29T18:10:17,015 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 240 records.
2019-03-29T18:10:17,015 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: at row 0. reading next block
2019-03-29T18:10:17,024 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: block read in memory in 9 ms. row c ount = 240
2019-03-29T18:10:17,217 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T18:10:17,269 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
2019-03-29T18:10:17,306 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
2019-03-29T18:10:17,315 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 240 records.
2019-03-29T18:10:17,315 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: at row 0. reading next block
2019-03-29T18:10:17,325 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: block read in memory in 10 ms. row count = 240
2019-03-29T18:10:17,478 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T18:10:17,513 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
2019-03-29T18:10:17,548 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
2019-03-29T18:10:17,559 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 240 records.
2019-03-29T18:10:17,559 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: at row 0. reading next block
2019-03-29T18:10:17,568 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: block read in memory in 9 ms. row c ount = 240
2019-03-29T18:10:17,729 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T18:10:17,805 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
2019-03-29T18:10:17,845 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
2019-03-29T18:10:17,854 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 4584 records.

  was:
I have setup Hive (v2.3.4) on Spark (exec engine, but MR gets same issue). My external hive table is Parquet format on s3 across 100s of partitions. Below settings are set to 20:

{{hive.exec.input.listing.max.threads mapred.dfsclient.parallelism.max mapreduce.input.fileinputformat.list-status.num-threads }}

Run a simple query:

{{select * from s.there h_code = 'KGD78' and h_no = '265' }}

I can see the below in HiveServer2 logs (the logs continue for more than 1000 lines listing all the different partitions). Why is the listing of files not being done in parallel? It takes more than 5mins just in the listing.

{{2019-03-29T11:29:26,866 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] compress.CodecPool: Got brand-new decompressor [.snappy] 2019-03-29T11:29:27,283 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:27,797 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:28,374 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:28,919 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:29,483 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:30,003 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:30,518 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:31,001 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:31,549 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:32,048 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:32,574 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:33,130 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:33,639 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:34,189 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:34,743 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:35,208 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:35,701 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:36,183 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:36,662 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:37,154 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:37,645 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 }}

I've tried

{{hive.exec.input.listing.max.threads mapred.dfsclient.parallelism.max mapreduce.input.fileinputformat.list-status.num-threads }}

with defaults, 1, 50...still same result


> hiveserver2 - “mapred.FileInputFormat: Total input files to process” - why single threaded?
> -------------------------------------------------------------------------------------------
>
>                 Key: HIVE-21546
>                 URL: https://issues.apache.org/jira/browse/HIVE-21546
>             Project: Hive
>          Issue Type: Bug
>            Reporter: t oo
>            Priority: Major
>
> I have setup Hive (v2.3.4) on Spark (exec engine, but MR gets same issue), hadoop 2.7.6 (or hadoop 2.8.5). My external hive table is Parquet format on s3 across 100s of partitions. Below settings are set to 20:
> {\{hive.exec.input.listing.max.threads mapred.dfsclient.parallelism.max mapreduce.input.fileinputformat.list-status.num-threads }}
> Run a simple query:
> {\{select * from s.there h_code = 'KGD78' and h_no = '265' }}
> I can see the below in HiveServer2 logs (the logs continue for more than 1000 lines listing all the different partitions). Why is the listing of files not being done in parallel? It takes more than 5mins just in the listing.
> {{2019-03-29T11:29:26,866 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] compress.CodecPool: Got brand-new decompressor [.snappy] 2019-03-29T11:29:27,283 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:27,797 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:28,374 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:28,919 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:29,483 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:30,003 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:30,518 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:31,001 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:31,549 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:32,048 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:32,574 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:33,130 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:33,639 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:34,189 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:34,743 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:35,208 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:35,701 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:36,183 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:36,662 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:37,154 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 2019-03-29T11:29:37,645 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1 }}
> I've tried
> {\{hive.exec.input.listing.max.threads mapred.dfsclient.parallelism.max mapreduce.input.fileinputformat.list-status.num-threads }}
> with defaults, 1, 50...still same result
>  
>  
>  
> Hive 3.1.1/hadoop3.1.2 also has the issue:
>  
> 2019-03-29T18:10:15,451 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: at row 0. reading next block
> 2019-03-29T18:10:15,461 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: block read in memory in 10 ms. row count = 4584
> 2019-03-29T18:10:15,620 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] mapred.FileInputFormat: Total input files to process : 1
> 2019-03-29T18:10:15,714 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
> 2019-03-29T18:10:15,757 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
> 2019-03-29T18:10:15,767 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 4584 records.
> 2019-03-29T18:10:15,767 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: at row 0. reading next block
> 2019-03-29T18:10:15,777 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: block read in memory in 10 ms. row count = 4584
> 2019-03-29T18:10:15,984 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] mapred.FileInputFormat: Total input files to process : 1
> 2019-03-29T18:10:16,033 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
> 2019-03-29T18:10:16,070 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
> 2019-03-29T18:10:16,080 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 4584 records.
> 2019-03-29T18:10:16,080 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: at row 0. reading next block
> 2019-03-29T18:10:16,089 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: block read in memory in 9 ms. row c ount = 4584
> 2019-03-29T18:10:16,287 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] mapred.FileInputFormat: Total input files to process : 1
> 2019-03-29T18:10:16,356 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
> 2019-03-29T18:10:16,404 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
> 2019-03-29T18:10:16,415 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 4584 records.
> 2019-03-29T18:10:16,415 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: at row 0. reading next block
> 2019-03-29T18:10:16,426 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: block read in memory in 11 ms. row count = 4584
> 2019-03-29T18:10:16,613 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] mapred.FileInputFormat: Total input files to process : 1
> 2019-03-29T18:10:16,654 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
> 2019-03-29T18:10:16,700 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
> 2019-03-29T18:10:16,712 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 240 records.
> 2019-03-29T18:10:16,712 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: at row 0. reading next block
> 2019-03-29T18:10:16,722 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: block read in memory in 10 ms. row count = 240
> 2019-03-29T18:10:16,895 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] mapred.FileInputFormat: Total input files to process : 1
> 2019-03-29T18:10:16,934 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
> 2019-03-29T18:10:17,004 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
> 2019-03-29T18:10:17,015 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 240 records.
> 2019-03-29T18:10:17,015 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: at row 0. reading next block
> 2019-03-29T18:10:17,024 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: block read in memory in 9 ms. row c ount = 240
> 2019-03-29T18:10:17,217 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] mapred.FileInputFormat: Total input files to process : 1
> 2019-03-29T18:10:17,269 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
> 2019-03-29T18:10:17,306 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
> 2019-03-29T18:10:17,315 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 240 records.
> 2019-03-29T18:10:17,315 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: at row 0. reading next block
> 2019-03-29T18:10:17,325 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: block read in memory in 10 ms. row count = 240
> 2019-03-29T18:10:17,478 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] mapred.FileInputFormat: Total input files to process : 1
> 2019-03-29T18:10:17,513 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
> 2019-03-29T18:10:17,548 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
> 2019-03-29T18:10:17,559 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 240 records.
> 2019-03-29T18:10:17,559 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: at row 0. reading next block
> 2019-03-29T18:10:17,568 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: block read in memory in 9 ms. row c ount = 240
> 2019-03-29T18:10:17,729 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] mapred.FileInputFormat: Total input files to process : 1
> 2019-03-29T18:10:17,805 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
> 2019-03-29T18:10:17,845 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] s3a.S3AInputStream: Switching to Random IO seek policy
> 2019-03-29T18:10:17,854 INFO [16b32706-3490-432d-b49e-67279ea88e15 HiveServer2-Handler-Pool: Thread-30] hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 4584 records.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)