You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Xuefu Zhang (JIRA)" <ji...@apache.org> on 2017/06/08 03:30:18 UTC

[jira] [Created] (HIVE-16854) SparkClientFactory is locked too aggressively

Xuefu Zhang created HIVE-16854:
----------------------------------

             Summary: SparkClientFactory is locked too aggressively
                 Key: HIVE-16854
                 URL: https://issues.apache.org/jira/browse/HIVE-16854
             Project: Hive
          Issue Type: Bug
          Components: Spark
    Affects Versions: 1.1.0
            Reporter: Xuefu Zhang


Most methods in SparkClientFactory are synchronized on the SparkClientFactory singleton. However, some methods are very expensive, such as createClient(), which returns a SparkClientImpl instance. However, creating a SparkClientImpl instance requires starting a remote driver to connect back to RPCServer. This process can take a long time such as in case of a busy yarn queue. When this happens, all pending  calls on SparkClientFactory will have to wait for a long time.

In our case, hive.spark.client.server.connect.timeout is set to 1hr. This makes some queries waiting for hours before starting.

The current implementation seems pretty much making all remote driver launches serialized. If one of them takes time, the following ones will have to wait.

HS2 stacktrace is attached for reference. It's based on earlier version of Hive, so the line numbers might be slightly off.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)