You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@livy.apache.org by "Kim Hammar (JIRA)" <ji...@apache.org> on 2019/04/03 12:18:00 UTC

[jira] [Created] (LIVY-581) Edge-case where spark properties are overriden by Livy in YARN environments

Kim Hammar created LIVY-581:
-------------------------------

             Summary: Edge-case where spark properties are overriden by Livy in YARN environments
                 Key: LIVY-581
                 URL: https://issues.apache.org/jira/browse/LIVY-581
             Project: Livy
          Issue Type: Bug
            Reporter: Kim Hammar
             Fix For: 0.7.0


We use livy inside our multi-tenant data science platform that is running on YARN and HDFS. Recently we added support for SparkSQL on Hive by placing the necessary jar files in spark/jars, adding hive-site-xml in spark/conf and setting livy.repl.enableHiveContext=trueinlivy.conf.

However, yesterday, I discovered that when livy started the spark session it overrides our properties in spark.yarn.dist.files and spark.yarn.jars, this was never an issue before we enabled hive. Looking into the code, I found that what happens is that if hive is enabled, livy appends (if not already exists) the hive-site.xml to the list of files specified by the user in the spark.files property and the necessary hive jars to the list of spark jars specified by the user-request in the property spark.jars*,* see the related code snippet here:

[https://github.com/apache/incubator-livy/blob/56c76bc2d4563593edce062a563603fe63e5a431/server/src/main/scala/org/apache/livy/server/interactive/InteractiveSession.scala#L285]

Now what seems to happen is that if all of spark.files, spark.jars, spark.yarn.dist.files, and spark.yarn.jars are non-null when the job is submitted (spark.files spark.jars filled in by livy and spark.yarn.dist.files spark.yarn.jars filled in by the user-request from our platform),*_spark.yarn.dist.files gets set to spark.files and spark.yarn.jars gets set to spark.jars_*

Since for example spark.files and spark.yarn.dist.files have the same semantics but are supposed to be used for non-yarn and yarn deployments, respectively, spark just overwrites spark.yarn.dist.files with the contents of spark.files. In general, these configuration properties should be mutually exclusive, you should not mix them as one is designed for YARN mode and the other is for non-YARN mode.

Our current solution is to deploy a fork of livy on our platform where I check in the code whether the user-request have populated spark.yarn.X properties and then I append all livy-generated properties to the yarn-ones. Otherwise I append the livy-generated properties to the regular spark.X properties, see code snippet here:

https://github.com/Limmen/incubator-livy/commit/aa06f896753ae9d6ce6aa66a80cca36a82f84202



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)