You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "Adam Szita (JIRA)" <ji...@apache.org> on 2019/01/07 15:03:00 UTC

[jira] [Assigned] (HIVE-21096) Remove unnecessary Spark dependency from HS2 process

     [ https://issues.apache.org/jira/browse/HIVE-21096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adam Szita reassigned HIVE-21096:
---------------------------------


> Remove unnecessary Spark dependency from HS2 process
> ----------------------------------------------------
>
>                 Key: HIVE-21096
>                 URL: https://issues.apache.org/jira/browse/HIVE-21096
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2, Spark
>            Reporter: Adam Szita
>            Assignee: Adam Szita
>            Priority: Major
>
> When a HiveOnSpark job is kicked off most of the work is done by the RemoteDriver, which is a separate process. There a couple of smaller parts of code, where HS2 process depends on Spark jars, these for example include receiving stats from the driver or putting together a Spark conf object - used mostly during communication with RemoteDriver.
> We can limit the data types used for such communication so that we don't use (and serialize) types that are in Spark codebase, and hence we can refactor our code to only use Spark jars in the Remote Driver process.
> I think this way would be cleaner from dependencies point of view, and also less erroneous when users have to compile the classpath for their HS2 processes.
> (E.g. due to a change between Spark 2.2 and 2.4 we had to also include spark-unsafe*.jar - though it's an internal change to Spark..)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)