You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Mithun Antony (Jira)" <ji...@apache.org> on 2021/03/04 23:29:00 UTC
[jira] [Updated] (HIVE-24849) Create external table socket timeout
when location has large number of files
[ https://issues.apache.org/jira/browse/HIVE-24849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mithun Antony updated HIVE-24849:
---------------------------------
Description:
The create table API call timeout when during an external table creation on a location where the number files in the S3 location is large ( ie: ~10K objects ).
The default timeout `hive.metastore.client.socket.timeout` is `600s` current workaround is it to increase the timeout to a higher value
{code:java}
2021-03-04T01:37:42,761 ERROR [66b8024b-e52f-42b8-8629-a45383bcac0c main([])]: exec.DDLTask (DDLTask.java:failed(639)) - org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:873)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:878)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4356)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:354)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:490)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:1199)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_with_environment_context(ThriftHiveMetastore.java:1185)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:2399)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.create_table_with_environment_context(SessionHiveMetaStoreClient.java:93)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:752)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:740)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
at com.sun.proxy.$Proxy37.createTable(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2330)
at com.sun.proxy.$Proxy37.createTable(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:863)
... 25 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
... 49 more{code}
Understanding is the create table should not check the exisiting files/partitions and the validation is done when the table is read.
The issue could be reproduced executing the below statment in Hive CLI where the location should contain ~10k objects
{{}}
{code:java}
SET hive.stats.autogather=false;
DROP table if exists tTable;
CREATE EXTERNAL TABLE tTable(
id string,
mId string,
wd bigint,
lang string,
tcountry string,
vId bigint
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
LOCATION "s3://<your bucket>/"
TBLPROPERTIES ('serialization.null.format' = ''); {code}
was:
The create table API call timeout when during an external table creation on a location where the number files in the S3 location is large ( ie: ~10K objects ).
The default timeout `hive.metastore.client.socket.timeout` is `600s` current workaround is it to increase the timeout to a higher value
{code:java}
2021-03-04T01:37:42,761 ERROR [66b8024b-e52f-42b8-8629-a45383bcac0c main([])]: exec.DDLTask (DDLTask.java:failed(639)) - org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:873)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:878)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4356)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:354)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:490)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:1199)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_with_environment_context(ThriftHiveMetastore.java:1185)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:2399)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.create_table_with_environment_context(SessionHiveMetaStoreClient.java:93)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:752)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:740)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
at com.sun.proxy.$Proxy37.createTable(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2330)
at com.sun.proxy.$Proxy37.createTable(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:863)
... 25 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
... 49 more{code}
Understanding is the create table should not check the exisiting files/partitions and the validation is done when the table is read.
> Create external table socket timeout when location has large number of files
> ----------------------------------------------------------------------------
>
> Key: HIVE-24849
> URL: https://issues.apache.org/jira/browse/HIVE-24849
> Project: Hive
> Issue Type: Bug
> Components: Metastore
> Affects Versions: 2.3.4
> Environment: AWS EMR 5.23 with default Hive metastore and external location S3
>
> Reporter: Mithun Antony
> Priority: Major
>
> The create table API call timeout when during an external table creation on a location where the number files in the S3 location is large ( ie: ~10K objects ).
> The default timeout `hive.metastore.client.socket.timeout` is `600s` current workaround is it to increase the timeout to a higher value
> {code:java}
> 2021-03-04T01:37:42,761 ERROR [66b8024b-e52f-42b8-8629-a45383bcac0c main([])]: exec.DDLTask (DDLTask.java:failed(639)) - org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
> at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:873)
> at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:878)
> at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4356)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:354)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
> at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:474)
> at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:490)
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
> at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
> at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
> at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
> at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
> at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:1199)
> at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_with_environment_context(ThriftHiveMetastore.java:1185)
> at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:2399)
> at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.create_table_with_environment_context(SessionHiveMetaStoreClient.java:93)
> at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:752)
> at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:740)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
> at com.sun.proxy.$Proxy37.createTable(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2330)
> at com.sun.proxy.$Proxy37.createTable(Unknown Source)
> at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:863)
> ... 25 more
> Caused by: java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> ... 49 more{code}
>
> Understanding is the create table should not check the exisiting files/partitions and the validation is done when the table is read.
> The issue could be reproduced executing the below statment in Hive CLI where the location should contain ~10k objects
> {{}}
> {code:java}
>
> SET hive.stats.autogather=false;
> DROP table if exists tTable;
> CREATE EXTERNAL TABLE tTable(
> id string,
> mId string,
> wd bigint,
> lang string,
> tcountry string,
> vId bigint
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
> LOCATION "s3://<your bucket>/"
> TBLPROPERTIES ('serialization.null.format' = ''); {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)