You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Nathan Clark (Jira)" <ji...@apache.org> on 2019/09/11 17:31:00 UTC

[jira] [Created] (HIVE-22196) Socket timeouts happen when other drivers set DriverManager.loginTimeout

Nathan Clark created HIVE-22196:
-----------------------------------

             Summary: Socket timeouts happen when other drivers set DriverManager.loginTimeout
                 Key: HIVE-22196
                 URL: https://issues.apache.org/jira/browse/HIVE-22196
             Project: Hive
          Issue Type: Bug
          Components: JDBC, Thrift API
    Affects Versions: 3.1.2, 2.0.0, 1.2.1
         Environment: Any Hive JDBC client that uses other SQL clients besides Hive, or any other kind of JDBC driver (e.g. connection pooling). This can only happen if the other driver writes values to {{DriverManager.setLoginTimeout()}}. HikariCP is one suspect, there are probably others as well.
            Reporter: Nathan Clark


There are a few somewhat sketchy things happening in Hive/Thrift code in the JDBC client that result in intermittent "read timed out" (and subsequently "out of sequence") errors when other JDBC drivers are active in the same client JVM that set {{DriverManager.loginTimeout}}.
 # The login timeout used to initialize a {{HiveConnection}} is populated from {{DriverManager.loginTimeout}} in the core Java JDBC library. This sounds like a nice, orthodox place to get a login timeout from, but it's fundamentally problematic and really shouldn't be used. The reason is that it's a *global* singleton value, and any JDBC Driver (or any other piece of code for that matter) can write to it at will (and is implicitly invited to). The Hive JDBC stack _itself_ writes values to this global setting in a couple of places seemingly unrelated to the client connection setup.
 # The _read_ timeout for Thrift _socket-level_ reads is actually populated from this _login_ timeout (a.k.a. "connect timeout") setting. (See Thrift's {{TSocket(String host, int port, int timeout)}} and its callers in {{HiveAuthFactory}}. Also note the numerous code comments that speak of setting {{SO_TIMEOUT}} (the socket read timeout) while the actual code references a variable called {{loginTimeout}}.) Socket reads can occur thousands of times in an application that does lots of Hive queries, and their individual workloads are each individually less predictable than simply getting a connection, which typically happens at most a few times. So you have a huge probability that a login timeout setting, which seems to usually receive a reasonable value of 30 seconds if constrained at all, will occasionally (way too often) be inadequate for a socket read.
 # There seems to be no option to set this login timeout (or the actual read timeout) explicitly as an externalized override setting (but see HIVE-12371). 

*Summary:* {\{DriverManager.loginTimeout}} can be innocently set by any JDBC driver present in the JVM, you can't override it, and it's misused by Hive as a socket read timeout. There's no way to prevent intermittent read timeouts in this scenario unless you're lucky enough to find the JDBC driver and reconfigure its timeout setting to something workable for Hive socket reads.

An easy, crude patch:

modify the first line of {{HiveConnection.setupLoginTimeout()}} from:

{{long timeOut = TimeUnit.SECONDS.toMillis(DriverManager.getLoginTimeout());}}

to:

{{long timeOut = TimeUnit.SECONDS.toMillis(0);}}

This is of course not a robust fix, as server issues during socket reads can result in a hung client thread. Some other hardcoded value might be more advisable, as long as it's long enough to prevent spurious read timeouts.

The right approach is to prioritize HIVE-12371 (proposed socket timeout override setting that doesn't depend on {{DriverManager.loginTimeout}}) and implement it in all possible versions.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)