You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Manish Malhotra <ma...@gmail.com> on 2015/01/21 02:58:58 UTC
Hive Thrift Metastore Server : Socket timeout Exception / Read timeout
Hi All,
I'm using Hive Thrift Server in Production which at peak handles around 500
req/min.
After certain point the Hive Thrift Server is going into the no response
mode and throws
Following exception
"org.apache.hadoop.hive.ql.metadata.HiveException:
org.apache.thrift.transport.TTransportException:
java.net.SocketTimeoutException: Read timed out"
As the metastore we are using MySQL, that is being used by Thrift server.
The design / architecture is like this:
Oozie -- > Hive Action --> ELB (AWS) --> Hive Thrift ( 2 servers) --> MySQL
(Master) -- > MySQL (Slave).
Software versions:
Hive version : 0.10.0
Hadoop: 1.2.1
Looks like when the load is beyond some threshold for certain operations it
is having problem in responding.
As the hive jobs sometimes fails because of this issue, we also have a
auto-restart check to see if the Thrift server is not responding, it stops
/ kills and restart the service.
Other tuning done:
Thrift Server:
Given 11gb heap, and configured CMS GC algo.
MySQL:
Tuned innodb_buffer, tmp_table and max_heap parameters.
So, can somebody please help to understand, what could be the root cause
for this or somebody faced the similar issue.
I found one related JIRA :https://issues.apache.org/jira/browse/HCATALOG-541
But this JIRA shows that Hive Thrift Server shows OOM error, but in my case
I didnt see any OOM error in my case.
Regards,
Manish
Full Exception Stack:
at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:412)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:399)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:736)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74)
at $Proxy7.getDatabase(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1110)
at
org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099)
at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935)
at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
... 34 more
2015-01-20 22:44:12,978 ERROR exec.Task (SessionState.java:printError(401))
- FAILED: Error in metadata:
org.apache.thrift.transport.TTransportException:
java.net.SocketTimeoutException: Read timed out
org.apache.hadoop.hive.ql.metadata.HiveException:
org.apache.thrift.transport.TTransportException:
java.net.SocketTimeoutException: Read timed out
at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1114)
at
org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099)
at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935)
at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
Re: Hive Thrift Metastore Server : Socket timeout Exception / Read timeout
Posted by Manish Malhotra <ma...@gmail.com>.
Hi All,
Updating further on this issue, as did some further debugging and see some
of the threads which are using high CPU.
And the operation I tried is listPartition on a table that has huge number
of partitions ( around 1.5 million).
One of the thread that was using large chunk of CPU.
838 "pool-1-thread-141" prio=10 tid=0x00007fa51c8d3800 nid=0x2fe runnable
[0x00007fa4f8b8a000]
839 java.lang.Thread.State: RUNNABLE
840 at java.net.SocketInputStream.socketRead0(Native Method)
841 at java.net.SocketInputStream.read(SocketInputStream.java:150)
842 at java.net.SocketInputStream.read(SocketInputStream.java:121)
843 at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
844 at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
845 at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
846 - locked <0x0000000564aba418> (a java.io.BufferedInputStream)
847 at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
848 at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
849 at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
850 at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
851 at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
852 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
853 at
org.apache.hadoop.hive.metastore.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:48)
854 at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
855 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
856 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
857 at java.lang.Thread.run(Thread.java:722)
Regards,
Manish
On Tue, Jan 20, 2015 at 5:58 PM, Manish Malhotra <
manish.hadoop.work@gmail.com> wrote:
> Hi All,
>
> I'm using Hive Thrift Server in Production which at peak handles around
> 500 req/min.
> After certain point the Hive Thrift Server is going into the no response
> mode and throws
> Following exception
> "org.apache.hadoop.hive.ql.metadata.HiveException:
> org.apache.thrift.transport.TTransportException:
> java.net.SocketTimeoutException: Read timed out"
>
> As the metastore we are using MySQL, that is being used by Thrift server.
> The design / architecture is like this:
>
> Oozie -- > Hive Action --> ELB (AWS) --> Hive Thrift ( 2 servers) -->
> MySQL (Master) -- > MySQL (Slave).
>
> Software versions:
>
> Hive version : 0.10.0
> Hadoop: 1.2.1
>
>
> Looks like when the load is beyond some threshold for certain operations
> it is having problem in responding.
> As the hive jobs sometimes fails because of this issue, we also have a
> auto-restart check to see if the Thrift server is not responding, it stops
> / kills and restart the service.
>
> Other tuning done:
>
> Thrift Server:
>
> Given 11gb heap, and configured CMS GC algo.
>
> MySQL:
>
> Tuned innodb_buffer, tmp_table and max_heap parameters.
>
> So, can somebody please help to understand, what could be the root cause
> for this or somebody faced the similar issue.
>
> I found one related JIRA :
> https://issues.apache.org/jira/browse/HCATALOG-541
>
> But this JIRA shows that Hive Thrift Server shows OOM error, but in my
> case I didnt see any OOM error in my case.
>
>
> Regards,
> Manish
>
> Full Exception Stack:
>
> at
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
> at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:412)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:399)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:736)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74)
> at $Proxy7.getDatabase(Unknown Source)
> at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1110)
> at
> org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099)
> at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> Caused by: java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:150)
> at java.net.SocketInputStream.read(SocketInputStream.java:121)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> at
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> ... 34 more
> 2015-01-20 22:44:12,978 ERROR exec.Task
> (SessionState.java:printError(401)) - FAILED: Error in metadata:
> org.apache.thrift.transport.TTransportException:
> java.net.SocketTimeoutException: Read timed out
> org.apache.hadoop.hive.ql.metadata.HiveException:
> org.apache.thrift.transport.TTransportException:
> java.net.SocketTimeoutException: Read timed out
> at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1114)
> at
> org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099)
> at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>
>