You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Manish Malhotra <ma...@gmail.com> on 2015/01/21 02:58:58 UTC

Hive Thrift Metastore Server : Socket timeout Exception / Read timeout

Hi All,

I'm using Hive Thrift Server in Production which at peak handles around 500
req/min.
After certain point the Hive Thrift Server is going into the no response
mode and throws
Following exception
"org.apache.hadoop.hive.ql.metadata.HiveException:
org.apache.thrift.transport.TTransportException:
java.net.SocketTimeoutException: Read timed out"

As the metastore we are using MySQL, that is being used by Thrift server.
The design / architecture is like this:

Oozie -- > Hive Action --> ELB (AWS) --> Hive Thrift ( 2 servers) --> MySQL
(Master) -- > MySQL (Slave).

Software versions:

   Hive version : 0.10.0
   Hadoop: 1.2.1


Looks like when the load is beyond some threshold for certain operations it
is having problem in responding.
As the hive jobs sometimes fails because of this issue, we also have a
auto-restart check to see if the Thrift server is not responding, it stops
/ kills and restart the service.

Other tuning done:

Thrift Server:

Given 11gb heap, and configured CMS GC algo.

MySQL:

Tuned innodb_buffer, tmp_table and max_heap parameters.

So, can somebody please help to understand, what could be the root cause
for this or somebody faced the similar issue.

I found one related JIRA :https://issues.apache.org/jira/browse/HCATALOG-541

But this JIRA shows that Hive Thrift Server shows OOM error, but in my case
I didnt see any OOM error in my case.


Regards,
Manish

Full Exception Stack:

    at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
    at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
    at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
    at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:412)
    at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:399)
    at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:736)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74)
    at $Proxy7.getDatabase(Unknown Source)
    at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1110)
    at
org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099)
    at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206)
    at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
    at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935)
    at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:150)
    at java.net.SocketInputStream.read(SocketInputStream.java:121)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
    at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
    ... 34 more
2015-01-20 22:44:12,978 ERROR exec.Task (SessionState.java:printError(401))
- FAILED: Error in metadata:
org.apache.thrift.transport.TTransportException:
java.net.SocketTimeoutException: Read timed out
org.apache.hadoop.hive.ql.metadata.HiveException:
org.apache.thrift.transport.TTransportException:
java.net.SocketTimeoutException: Read timed out
    at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1114)
    at
org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099)
    at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206)
    at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
    at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935)
    at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)

Re: Hive Thrift Metastore Server : Socket timeout Exception / Read timeout

Posted by Manish Malhotra <ma...@gmail.com>.
Hi All,

Updating further on this issue, as did some further debugging and see some
of the threads which are using high CPU.
And the operation I tried is listPartition on a table that has huge number
of partitions ( around 1.5 million).

One of the thread that was using large chunk of CPU.

838 "pool-1-thread-141" prio=10 tid=0x00007fa51c8d3800 nid=0x2fe runnable
[0x00007fa4f8b8a000]
 839    java.lang.Thread.State: RUNNABLE
 840     at java.net.SocketInputStream.socketRead0(Native Method)
 841     at java.net.SocketInputStream.read(SocketInputStream.java:150)
 842     at java.net.SocketInputStream.read(SocketInputStream.java:121)
 843     at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
 844     at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
 845     at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
 846     - locked <0x0000000564aba418> (a java.io.BufferedInputStream)
 847     at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
 848     at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 849     at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
 850     at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
 851     at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
 852     at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
 853     at
org.apache.hadoop.hive.metastore.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:48)
 854     at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
 855     at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 856     at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 857     at java.lang.Thread.run(Thread.java:722)

Regards,
Manish


On Tue, Jan 20, 2015 at 5:58 PM, Manish Malhotra <
manish.hadoop.work@gmail.com> wrote:

> Hi All,
>
> I'm using Hive Thrift Server in Production which at peak handles around
> 500 req/min.
> After certain point the Hive Thrift Server is going into the no response
> mode and throws
> Following exception
> "org.apache.hadoop.hive.ql.metadata.HiveException:
> org.apache.thrift.transport.TTransportException:
> java.net.SocketTimeoutException: Read timed out"
>
> As the metastore we are using MySQL, that is being used by Thrift server.
> The design / architecture is like this:
>
> Oozie -- > Hive Action --> ELB (AWS) --> Hive Thrift ( 2 servers) -->
> MySQL (Master) -- > MySQL (Slave).
>
> Software versions:
>
>    Hive version : 0.10.0
>    Hadoop: 1.2.1
>
>
> Looks like when the load is beyond some threshold for certain operations
> it is having problem in responding.
> As the hive jobs sometimes fails because of this issue, we also have a
> auto-restart check to see if the Thrift server is not responding, it stops
> / kills and restart the service.
>
> Other tuning done:
>
> Thrift Server:
>
> Given 11gb heap, and configured CMS GC algo.
>
> MySQL:
>
> Tuned innodb_buffer, tmp_table and max_heap parameters.
>
> So, can somebody please help to understand, what could be the root cause
> for this or somebody faced the similar issue.
>
> I found one related JIRA :
> https://issues.apache.org/jira/browse/HCATALOG-541
>
> But this JIRA shows that Hive Thrift Server shows OOM error, but in my
> case I didnt see any OOM error in my case.
>
>
> Regards,
> Manish
>
> Full Exception Stack:
>
>     at
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
>     at
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
>     at
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
>     at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
>     at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_database(ThriftHiveMetastore.java:412)
>     at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_database(ThriftHiveMetastore.java:399)
>     at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:736)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74)
>     at $Proxy7.getDatabase(Unknown Source)
>     at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1110)
>     at
> org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099)
>     at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206)
>     at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
>     at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935)
>     at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
>     at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> Caused by: java.net.SocketTimeoutException: Read timed out
>     at java.net.SocketInputStream.socketRead0(Native Method)
>     at java.net.SocketInputStream.read(SocketInputStream.java:150)
>     at java.net.SocketInputStream.read(SocketInputStream.java:121)
>     at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>     at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>     at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>     at
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
>     ... 34 more
> 2015-01-20 22:44:12,978 ERROR exec.Task
> (SessionState.java:printError(401)) - FAILED: Error in metadata:
> org.apache.thrift.transport.TTransportException:
> java.net.SocketTimeoutException: Read timed out
> org.apache.hadoop.hive.ql.metadata.HiveException:
> org.apache.thrift.transport.TTransportException:
> java.net.SocketTimeoutException: Read timed out
>     at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1114)
>     at
> org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1099)
>     at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2206)
>     at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
>     at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935)
>     at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>
>