You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by steveloughran <gi...@git.apache.org> on 2015/10/22 23:18:33 UTC

[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

GitHub user steveloughran opened a pull request:

    https://github.com/apache/spark/pull/9232

    Stevel/patches/spark 11265 hive tokens

    This is just the first "it compiles" stage of the patch: it doesn't contain any fixes
    
    I've just pulled the current code out of the yarn client class into {{YarnSparkHadoopUtil}}, split it into an inner exception-throwing method and an outer some-exception-swallowing class, so that the inner class can be used in tests. And, taking advantage of the fact that {{HiveConf extends Configuration}}, avoiding so much introspection.
    
    Todo
    
    1. Test suite to verify that things fail
    1. fix the test
    
    This may need the yarn tests to depend on hive-1.2.1 core JAR at *test* only

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/steveloughran/spark stevel/patches/SPARK-11265-hive-tokens

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9232.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9232
    
----
commit 3bb1b921fd229aff17bfe7860f479959987c0736
Author: Steve Loughran <st...@hortonworks.com>
Date:   2015-10-22T21:08:26Z

    SPARK-11265 YarnClient can't get tokens to talk to Hive in a secure cluster. This is just the first "it compiles" stage of the patch: no tests or attempts to run things yet

commit 1fcb42ec819052f1c3dd70c7543eaa824c3be23c
Author: Steve Loughran <st...@hortonworks.com>
Date:   2015-10-22T21:15:07Z

    SPARK-11265 reduce size of patch on imports, even though the existing set doesn't follow the current rules precisely

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43266858
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,55 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    +    val e = intercept[InvocationTargetException] {
    +      util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice")
    +    }
    +    assertNestedHiveException(e)
    +    // expect exception trapping code to unwind this hive-side exception
    +    assertNestedHiveException(intercept[InvocationTargetException] {
    +      util.obtainTokenForHiveMetastore(hadoopConf)
    +    })
    +  }
    +
    +  def assertNestedHiveException(e: InvocationTargetException): Throwable = {
    +    val inner = e.getCause
    +    if (inner == null) {
    +      fail("No inner cause", e)
    +    }
    +    if (!inner.isInstanceOf[HiveException]) {
    +      fail(s"Not a hive exception", inner)
    +    }
    +    inner
    +  }
    +
    +  test("handleTokenIntrospectionFailure") {
    +    val util = new YarnSparkHadoopUtil
    +    // downgraded exceptions
    +    util.handleTokenIntrospectionFailure("hive", new ClassNotFoundException("cnfe"))
    --- End diff --
    
    BTW, if you really want to implement a shared policy, I'd recommend adding something like `scala.util.control.NonFatal`. That makes the exception handling cleaner; it would look more like this:
    
        try {
          // code that can throw
        } catch {
          case IgnorableException(e) => logDebug(...)
        }



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43336052
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,55 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    +    val e = intercept[InvocationTargetException] {
    +      util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice")
    +    }
    +    assertNestedHiveException(e)
    +    // expect exception trapping code to unwind this hive-side exception
    +    assertNestedHiveException(intercept[InvocationTargetException] {
    +      util.obtainTokenForHiveMetastore(hadoopConf)
    +    })
    +  }
    +
    +  def assertNestedHiveException(e: InvocationTargetException): Throwable = {
    +    val inner = e.getCause
    +    if (inner == null) {
    +      fail("No inner cause", e)
    +    }
    +    if (!inner.isInstanceOf[HiveException]) {
    +      fail(s"Not a hive exception", inner)
    +    }
    +    inner
    +  }
    +
    +  test("handleTokenIntrospectionFailure") {
    +    val util = new YarnSparkHadoopUtil
    +    // downgraded exceptions
    +    util.handleTokenIntrospectionFailure("hive", new ClassNotFoundException("cnfe"))
    --- End diff --
    
    Or yet another option is to have the method that handles exception take a closure, instead of the current approach of a method that matches on an exception parameter. e.g.:
    
        def tryToGetTokens(service: String)(fn: () => Option[Token]): Option[Token] = {
          try {
            fn()
          } catch {
             ...
          }
        }
    
        def obtainTokenForHiveMetastore... = {
          tryToGetTokens("Hive") { obtainTokenForHiveMetastoreInner(...) }
        }
    
    I mostly dislike that exception handling feels like it's scattered around. You have a catch block in one place, then match on the exception somewhere else, it makes it hard to see what's really being done in one look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43194609
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,99 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: Exception => {
    +        handleTokenIntrospectionFailure("Hive", e)
    +        None
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Handle failures to obtain a token through introspection. Failures to load the class are
    +   * not treated as errors: anything else is.
    +   * @param service service name for error messages
    +   * @param thrown exception caught
    +   * @throws Exception if the `thrown` exception isn't one that is to be ignored
    +   */
    +  private[yarn] def handleTokenIntrospectionFailure(service: String, thrown: Throwable): Unit = {
    +    thrown match {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"$service class not found $e")
    +        logDebug("Hive Class not found", e)
    +      case t: Throwable => {
    --- End diff --
    
    those exceptions aren't being rethrown though, are they? So its logging the full stack @ debug, and a one-liner for most.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150813301
  
    **[Test build #44296 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44296/consoleFull)** for PR 9232 at commit [`89edfa6`](https://github.com/apache/spark/commit/89edfa649ef82e126f492df0d521dd7d3079ad76).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150813327
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r42939243
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,28 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    +    val e = intercept[InvocationTargetException] {
    +      val token = util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice")
    +      fail(s"Expected an exception, got the token $token")
    --- End diff --
    
    Why the `fail`? That's what the `intercept` is for. If you're expecting a different exception, then change the `intercept`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150586249
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r42898469
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
    @@ -1337,55 +1337,8 @@ object Client extends Logging {
           conf: Configuration,
           credentials: Credentials) {
         if (shouldGetTokens(sparkConf, "hive") && UserGroupInformation.isSecurityEnabled) {
    --- End diff --
    
    Actually I don't really think the delegation tokens are needed in client mode (for Hive only). Only the driver talks to the metastore.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151818616
  
    **[Test build #44522 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44522/consoleFull)** for PR 9232 at commit [`217faba`](https://github.com/apache/spark/commit/217faba0d372ac66c57420372db62244e628da39).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `        logInfo(s\"$service class not found $e\")`\n  * `        logDebug(\"$service class not found\", e)`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150813328
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44296/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43369014
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
    @@ -1337,55 +1337,9 @@ object Client extends Logging {
           conf: Configuration,
           credentials: Credentials) {
         if (shouldGetTokens(sparkConf, "hive") && UserGroupInformation.isSecurityEnabled) {
    -      val mirror = universe.runtimeMirror(getClass.getClassLoader)
    --- End diff --
    
    this should go to `utils.getContextOrSparkClassLoader()`; notable that scalastyle doesn't pick up on this, even though it rejects `Class.forName()` since SPARK-8962


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150811768
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-152325087
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r42945007
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,28 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    +    val e = intercept[InvocationTargetException] {
    +      val token = util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice")
    +      fail(s"Expected an exception, got the token $token")
    +    }
    +    val inner = e.getCause
    +    if (inner == null) {
    +      fail("No inner cause", e)
    +    }
    +    if (!inner.isInstanceOf[HiveException]) {
    +      fail(s"Not a hive exception", inner)
    --- End diff --
    
    I want to include the inner exception if it's of the wrong type, so that the Junit/jenkins report can diagnose the failure. an `assert(inner.isInstanceOf[HiveException])` will say when the exception was of the wrong type, but not include the details, including the stack trace.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150693686
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-152781530
  
    Latest patch LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r42944993
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,104 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: NoSuchMethodException =>
    +        logInfo("Hive Method not found", e)
    +        None
    +      case e: ClassNotFoundException =>
    --- End diff --
    
    +1. I'd left it in there as it may have had a valid reason for being there, but i do things it's correct. Detecting config problems, that is something to throw up.
    
    Note that `Client.obtainTokenForHBase()` has similar behaviour; this patch doesn't address it. When someone sits down to do it, the policy about how to react to failures could be converted into a wrapper around a closure which executes the token retrieval (here `obtainTokenForHiveMetastoreInner`), so there'd be no divergence.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by chesterxgchen <gi...@git.apache.org>.
Github user chesterxgchen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r42810773
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -337,5 +339,82 @@ object YarnSparkHadoopUtil {
           conf.getInt("spark.executor.instances", targetNumExecutors)
         }
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore. Exceptions are caught and logged
    +   */
    +  def obtainTokenForHiveMetastore(
    --- End diff --
    
    I like the method return Option


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150684298
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-152334119
  
    **[Test build #44637 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44637/consoleFull)** for PR 9232 at commit [`dd8dea9`](https://github.com/apache/spark/commit/dd8dea926cec22853fb665c83f13c78a4128c20a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `        logInfo(s\"Hive class not found $e\")`\n  * `        logDebug(\"Hive class not found\", e)`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151673386
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43014420
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,117 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: Exception => {
    +        handleTokenIntrospectionFailure("Hive", e)
    +        None
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Handle failures to obtain a token through introspection. Failures to load the class are
    +   * not treated as errors: anything else is.
    +   * @param service service name for error messages
    +   * @param thrown exception caught
    +   * @throws Exception if the `thrown` exception isn't one that is to be ignored
    +   */
    +  private[yarn] def handleTokenIntrospectionFailure(service: String, thrown: Throwable): Unit = {
    +    thrown match {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"$service class not found $e")
    +        logDebug("Hive Class not found", e)
    +      case e: NoClassDefFoundError =>
    +        logDebug(s"$service class not found", e)
    +      case e: InvocationTargetException =>
    +        // problem talking to the metastore or other hive-side exception
    +        logInfo(s"$service method invocation failed", e)
    +        throw if (e.getCause != null) e.getCause else e
    +      case e: ReflectiveOperationException =>
    +        // any other reflection failure log at error and rethrow
    +        logError(s"$service Class operation failed", e)
    +        throw e;
    +      case e: RuntimeException =>
    +        // any runtime exception, including Illegal Argument Exception
    +        throw e
    +      case t: Throwable => {
    +        val msg = s"$service: Unexpected Exception " + t
    +        logError(msg, t)
    +        throw new RuntimeException(msg, t)
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Inner routine to obtains token for the Hive metastore; exceptions are raised on any problem.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this.
    +   * @param username the username of the principal requesting the delegating token.
    +   * @return a delegation token
    +   */
    +  private[yarn] def obtainTokenForHiveMetastoreInner(conf: Configuration,
    +      username: String): Option[Token[DelegationTokenIdentifier]] = {
    +    val mirror = universe.runtimeMirror(getClass.getClassLoader)
    +
    +    // the hive configuration class is a subclass of Hadoop Configuration, so can be cast down
    +    // to a Configuration and used without reflection
    +    val hiveConfClass = mirror.classLoader.loadClass("org.apache.hadoop.hive.conf.HiveConf")
    +    // using the (Configuration, Class) constructor allows the current configuratin to be included
    +    // in the hive config.
    +    val ctor = hiveConfClass.getDeclaredConstructor(classOf[Configuration],
    +      classOf[Object].getClass)
    +    val hiveConf = ctor.newInstance(conf, hiveConfClass).asInstanceOf[Configuration]
    +    val metastore_uri = hiveConf.getTrimmed("hive.metastore.uris", "")
    +
    +    // Check for local metastore
    +    if (metastore_uri.nonEmpty) {
    +      if (username.isEmpty) {
    --- End diff --
    
    `require(username.nonEmpty)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43587708
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,76 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"Hive class not found $e")
    +        logDebug("Hive class not found", e)
    --- End diff --
    
    They're not exactly the same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43193401
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,99 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: Exception => {
    +        handleTokenIntrospectionFailure("Hive", e)
    +        None
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Handle failures to obtain a token through introspection. Failures to load the class are
    +   * not treated as errors: anything else is.
    +   * @param service service name for error messages
    +   * @param thrown exception caught
    +   * @throws Exception if the `thrown` exception isn't one that is to be ignored
    +   */
    +  private[yarn] def handleTokenIntrospectionFailure(service: String, thrown: Throwable): Unit = {
    +    thrown match {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"$service class not found $e")
    +        logDebug("Hive Class not found", e)
    +      case t: Throwable => {
    --- End diff --
    
    really, you don't need this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-152328362
  
    **[Test build #44637 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44637/consoleFull)** for PR 9232 at commit [`dd8dea9`](https://github.com/apache/spark/commit/dd8dea926cec22853fb665c83f13c78a4128c20a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151176637
  
    **[Test build #44358 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44358/consoleFull)** for PR 9232 at commit [`0356c4c`](https://github.com/apache/spark/commit/0356c4c6f9356ecd4234dbe6a6f1b2f7b38b5e14).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151677558
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43488737
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,81 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"Hive class not found $e")
    +        logDebug("Hive class not found", e)
    +        None
    +      case t: Throwable =>
    --- End diff --
    
    OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150360112
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151674747
  
    **[Test build #44475 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44475/consoleFull)** for PR 9232 at commit [`00cb5a7`](https://github.com/apache/spark/commit/00cb5a7323a4f91adfa5c4273c8a6bcbc67dc008).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150726454
  
    @steveloughran could you use PR titles following the convention described in the following document?
    https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-PullRequest


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151818863
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44522/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150366139
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44176/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150662508
  
    **[Test build #44244 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44244/consoleFull)** for PR 9232 at commit [`5158afe`](https://github.com/apache/spark/commit/5158afe1e15966f0eb94b67bd097709d7b333606).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150366133
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150589550
  
    **[Test build #44223 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44223/consoleFull)** for PR 9232 at commit [`9630a9d`](https://github.com/apache/spark/commit/9630a9d80bc33f738dad6ebc841cb4aea058056d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-152497070
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150657934
  
    **[Test build #44244 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44244/consoleFull)** for PR 9232 at commit [`5158afe`](https://github.com/apache/spark/commit/5158afe1e15966f0eb94b67bd097709d7b333606).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43240828
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,99 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: Exception => {
    +        handleTokenIntrospectionFailure("Hive", e)
    +        None
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Handle failures to obtain a token through introspection. Failures to load the class are
    +   * not treated as errors: anything else is.
    +   * @param service service name for error messages
    +   * @param thrown exception caught
    +   * @throws Exception if the `thrown` exception isn't one that is to be ignored
    +   */
    +  private[yarn] def handleTokenIntrospectionFailure(service: String, thrown: Throwable): Unit = {
    +    thrown match {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"$service class not found $e")
    +        logDebug("Hive Class not found", e)
    +      case t: Throwable => {
    --- End diff --
    
    the reason I'd pulled it out was to have an isolated policy which could be both tested without mocking failures, and be re-used in the HBase token retrieval, which needs an identical set of clauses.
    
    Given the policy has now been simplified so much, the method is now pretty much unused; I can pull it. But still the HBase token logic will need to be 100% in sync.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151661825
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43368611
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,55 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    +    val e = intercept[InvocationTargetException] {
    +      util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice")
    +    }
    +    assertNestedHiveException(e)
    +    // expect exception trapping code to unwind this hive-side exception
    +    assertNestedHiveException(intercept[InvocationTargetException] {
    +      util.obtainTokenForHiveMetastore(hadoopConf)
    +    })
    +  }
    +
    +  def assertNestedHiveException(e: InvocationTargetException): Throwable = {
    +    val inner = e.getCause
    +    if (inner == null) {
    +      fail("No inner cause", e)
    +    }
    +    if (!inner.isInstanceOf[HiveException]) {
    +      fail(s"Not a hive exception", inner)
    +    }
    +    inner
    +  }
    +
    +  test("handleTokenIntrospectionFailure") {
    +    val util = new YarnSparkHadoopUtil
    +    // downgraded exceptions
    +    util.handleTokenIntrospectionFailure("hive", new ClassNotFoundException("cnfe"))
    --- End diff --
    
    I'd thought about that purer option; it's easily testable too.
    
    given the policy is so simple now, I'll just pull the catch handler into place, and replicate it in the fixed hbase code after; its simple enough that a review should suffice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151140385
  
    Thomas, I'm not sure that they actually pulled something, but instead overloaded the get() operation, which confused the reflection logic.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by zhzhan <gi...@git.apache.org>.
Github user zhzhan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43204781
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,97 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: Exception => {
    +        handleTokenIntrospectionFailure("Hive", e)
    +        None
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Handle failures to obtain a token through introspection. Failures to load the class are
    +   * not treated as errors: anything else is.
    +   * @param service service name for error messages
    +   * @param thrown exception caught
    +   * @throws Exception if the `thrown` exception isn't one that is to be ignored
    +   */
    +  private[yarn] def handleTokenIntrospectionFailure(service: String, thrown: Throwable): Unit = {
    +    thrown match {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"$service class not found $e")
    +        logDebug("Hive Class not found", e)
    +      case t: Throwable => {
    +        throw t
    --- End diff --
    
    Here the exception is thrown. I know swallow the exception is bad, but what happen if the user does not want to access the hive metastore but want to use spark even if token cannot be acquired? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151818859
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43478531
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,81 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"Hive class not found $e")
    +        logDebug("Hive class not found", e)
    +        None
    +      case t: Throwable =>
    +        throw t
    +    }
    +  }
    +
    +  /**
    +   * Inner routine to obtains token for the Hive metastore; exceptions are raised on any problem.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this.
    +   * @param username the username of the principal requesting the delegating token.
    +   * @return a delegation token
    +   */
    +  private[yarn] def obtainTokenForHiveMetastoreInner(conf: Configuration,
    +      username: String): Option[Token[DelegationTokenIdentifier]] = {
    +    val mirror = universe.runtimeMirror(Utils.getContextOrSparkClassLoader)
    +
    +    // the hive configuration class is a subclass of Hadoop Configuration, so can be cast down
    +    // to a Configuration and used without reflection
    +    val hiveConfClass = mirror.classLoader.loadClass("org.apache.hadoop.hive.conf.HiveConf")
    +    // using the (Configuration, Class) constructor allows the current configuratin to be included
    +    // in the hive config.
    +    val ctor = hiveConfClass.getDeclaredConstructor(classOf[Configuration],
    +      classOf[Object].getClass)
    +    val hiveConf = ctor.newInstance(conf, hiveConfClass).asInstanceOf[Configuration]
    +    val metastoreUri = hiveConf.getTrimmed("hive.metastore.uris", "")
    +
    +    // Check for local metastore
    +    if (metastoreUri.nonEmpty) {
    +      require(username.nonEmpty, "Username undefined")
    +      val principalKey = "hive.metastore.kerberos.principal"
    +      val principal = hiveConf.getTrimmed(principalKey, "")
    +      require(principal.nonEmpty, "Hive principal $principalKey undefined")
    +      logDebug(s"Getting Hive delegation token for $username against $principal at $metastoreUri")
    +      val hiveClass = mirror.classLoader.loadClass("org.apache.hadoop.hive.ql.metadata.Hive")
    +      val closeCurrent = hiveClass.getMethod("closeCurrent")
    +      try {
    +        // get all the instance methods before invoking any
    +        val getDelegationToken = hiveClass.getMethod("getDelegationToken",
    +          classOf[String], classOf[String])
    +        val getHive = hiveClass.getMethod("get", hiveConfClass)
    +
    +        // invoke
    +        val hive = getHive.invoke(null, hiveConf)
    +        val tokenStr = getDelegationToken.invoke(hive, username, principal)
    +          .asInstanceOf[java.lang.String]
    +        val hive2Token = new Token[DelegationTokenIdentifier]()
    +        hive2Token.decodeFromUrlString(tokenStr)
    +        Some(hive2Token)
    +      } finally {
    +        try {
    +          closeCurrent.invoke(null)
    +        } catch {
    +          case e: Exception => logWarning("In Hive.closeCurrent()", e)
    --- End diff --
    
    minor: you could use `Utils.tryLogNonFatalError`, although that uses `logError`. Should be fine here, though, since this shouldn't really happen normally.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43014139
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,117 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: Exception => {
    +        handleTokenIntrospectionFailure("Hive", e)
    +        None
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Handle failures to obtain a token through introspection. Failures to load the class are
    +   * not treated as errors: anything else is.
    +   * @param service service name for error messages
    +   * @param thrown exception caught
    +   * @throws Exception if the `thrown` exception isn't one that is to be ignored
    +   */
    +  private[yarn] def handleTokenIntrospectionFailure(service: String, thrown: Throwable): Unit = {
    +    thrown match {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"$service class not found $e")
    +        logDebug("Hive Class not found", e)
    +      case e: NoClassDefFoundError =>
    +        logDebug(s"$service class not found", e)
    +      case e: InvocationTargetException =>
    +        // problem talking to the metastore or other hive-side exception
    +        logInfo(s"$service method invocation failed", e)
    +        throw if (e.getCause != null) e.getCause else e
    +      case e: ReflectiveOperationException =>
    +        // any other reflection failure log at error and rethrow
    +        logError(s"$service Class operation failed", e)
    +        throw e;
    --- End diff --
    
    nuke the semi-colon. Also, I don't think you need to catch / log / re-throw any of these. Just let them bubble up and fail the app. The user will see the error. As I suggested before, the only `case` you need here is for `ClassNotFoundException`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150589707
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151183054
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151145666
  
    Thomas: been through the hive versions and they didn't pull anything, simply added another overloaded method before the existing three. Probably a latent bug in the reflection code, which wasn't explicitly asking for the method whose parameters matched those which it needed. In fact, i think this patch here would work with 0.13 too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151174413
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150704157
  
    **[Test build #44264 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44264/consoleFull)** for PR 9232 at commit [`b417ca3`](https://github.com/apache/spark/commit/b417ca32d16edbb7c305d1c3922278999e302137).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43128655
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,117 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: Exception => {
    +        handleTokenIntrospectionFailure("Hive", e)
    +        None
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Handle failures to obtain a token through introspection. Failures to load the class are
    +   * not treated as errors: anything else is.
    +   * @param service service name for error messages
    +   * @param thrown exception caught
    +   * @throws Exception if the `thrown` exception isn't one that is to be ignored
    +   */
    +  private[yarn] def handleTokenIntrospectionFailure(service: String, thrown: Throwable): Unit = {
    +    thrown match {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"$service class not found $e")
    +        logDebug("Hive Class not found", e)
    +      case e: NoClassDefFoundError =>
    +        logDebug(s"$service class not found", e)
    +      case e: InvocationTargetException =>
    +        // problem talking to the metastore or other hive-side exception
    +        logInfo(s"$service method invocation failed", e)
    +        throw if (e.getCause != null) e.getCause else e
    +      case e: ReflectiveOperationException =>
    +        // any other reflection failure log at error and rethrow
    +        logError(s"$service Class operation failed", e)
    +        throw e;
    --- End diff --
    
    `NoClassDefFound` may need to be covered too; I think it's the transient version of CNFE, though that may be superstition.
    
    I'd like to still unwind the `InvocationTargetException`, as its just a wrapper for what came in before; cutting it will simply reduce one level of needless stack trace


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150695706
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150701020
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by chesterxgchen <gi...@git.apache.org>.
Github user chesterxgchen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r42811237
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -337,5 +339,82 @@ object YarnSparkHadoopUtil {
           conf.getInt("spark.executor.instances", targetNumExecutors)
         }
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore. Exceptions are caught and logged
    +   */
    +  def obtainTokenForHiveMetastore(
    +      sparkConf: SparkConf,
    +      conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(sparkConf, conf,
    +        UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: java.lang.NoSuchMethodException =>
    +        logInfo("Hive Method not found " + e)
    +        None
    +      case e: java.lang.ClassNotFoundException =>
    +        logInfo("Hive Class not found " + e)
    +        None
    +      case e: java.lang.NoClassDefFoundError =>
    +        logDebug("Hive Class not found: " + e)
    +        None
    +      case e: java.lang.ReflectiveOperationException =>
    +        // serious problem: log at error and continue
    +        logError("Hive Class operation failed: " + e)
    +        None
    +      case e: RuntimeException =>
    +        // any runtime exception, including Illegal Argument Exception
    +        throw e
    +      case e: Exception => {
    +        logError("Unexpected Exception " + e)
    +        throw new RuntimeException("Unexpected exception", e)
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Inner routine to obtains token for the Hive metastore; exceptions are raised on any problem.
    +   */
    +  private[yarn] def obtainTokenForHiveMetastoreInner(
    +      sparkConf: SparkConf,
    +      conf: Configuration,
    +      username: String): Option[Token[DelegationTokenIdentifier]] = {
    +    val mirror = universe.runtimeMirror(getClass.getClassLoader)
    +
    +    // the hive configuration class is a subclass of Hadoop Configuration
    --- End diff --
    
    the hive metadata uri should also in Hadoop Configuration, does it make sense directly check the metastore URI from conf instead trying to get it from HiveConf ? which could potentially get exception


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43223745
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,97 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: Exception => {
    +        handleTokenIntrospectionFailure("Hive", e)
    +        None
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Handle failures to obtain a token through introspection. Failures to load the class are
    +   * not treated as errors: anything else is.
    +   * @param service service name for error messages
    +   * @param thrown exception caught
    +   * @throws Exception if the `thrown` exception isn't one that is to be ignored
    +   */
    +  private[yarn] def handleTokenIntrospectionFailure(service: String, thrown: Throwable): Unit = {
    +    thrown match {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"$service class not found $e")
    +        logDebug("Hive Class not found", e)
    +      case t: Throwable => {
    +        throw t
    --- End diff --
    
    The user can (i) not give Spark a hive configuration, in which case there will be no metastore URIs, and this code will be skipped, or (ii) set `spark.yarn.security.tokens.hive.enabled` to false.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150701221
  
    **[Test build #44264 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44264/consoleFull)** for PR 9232 at commit [`b417ca3`](https://github.com/apache/spark/commit/b417ca32d16edbb7c305d1c3922278999e302137).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r42952965
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,28 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    +    val e = intercept[InvocationTargetException] {
    +      val token = util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice")
    +      fail(s"Expected an exception, got the token $token")
    --- End diff --
    
    As with the case below, I fail to see how the token can help in figuring out what was wrong. If a token is returned, you kinda know what it will be already. You're testing the fact that a token should not be returned at all, and an exception thrown instead. If the exception is not thrown (or the wrong one), that's the problem, and the returned token won't help in figuring out why.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-152489569
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151655354
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43478397
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,81 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"Hive class not found $e")
    +        logDebug("Hive class not found", e)
    +        None
    +      case t: Throwable =>
    --- End diff --
    
    This is not needed...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150655438
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151677475
  
    **[Test build #44475 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44475/consoleFull)** for PR 9232 at commit [`00cb5a7`](https://github.com/apache/spark/commit/00cb5a7323a4f91adfa5c4273c8a6bcbc67dc008).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `        logInfo(s\"$service class not found $e\")`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150701035
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150688417
  
    Updated patch
    1. Moves all arg validation up before any hive class reflection
    1. gets all hive class methods before attempting to call any
    1. calls `Hive.closeCurrent()` in a finally block (if it could be loaded); catches and logs exceptions @ warn
    
    With this load-methods-first strategy, the `closeCurrent` variable will always contain a value prior to `Hive.get()` being invoked. As a result, `Hive.closeCurrent()` will always be invoked after `Hive.get()` method has been invoked (irrespective of outcome), so is guaranteed to trigger a cleanup once a Hive instance was created. Previously, any failure of `Hive.getDelegationToken()` would skip the cleanup. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150693689
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44257/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151173854
  
    -Updated patch pulls the `val token=` and `fail` code from the tests. It also pulls out the logic for deciding what to do with exceptions into its own class, which has two benefits.
    
    1. The logic can be tested directly, going through all the possible clauses of the match statement, and any conditions inside them.
    1. The hbase introspection code can be reworked to use the same method. This would guarantee one single implementation of logging & rethrow policy across the two (and any future) token acquisition clauses.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r42944947
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,28 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    +    val e = intercept[InvocationTargetException] {
    +      val token = util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice")
    +      fail(s"Expected an exception, got the token $token")
    --- End diff --
    
    wanted to include any token returned in the assert failure, on the basis that if something came back, it would be useful to find out what went wrong. `intercept`, just like JUnit's `@Test(expected=)` feature, picks up on the failure to raise the specific exception, but doesn't appear to say much else.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151808446
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by chesterxgchen <gi...@git.apache.org>.
Github user chesterxgchen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r42810639
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -337,5 +339,82 @@ object YarnSparkHadoopUtil {
           conf.getInt("spark.executor.instances", targetNumExecutors)
         }
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore. Exceptions are caught and logged
    +   */
    +  def obtainTokenForHiveMetastore(
    +      sparkConf: SparkConf,
    +      conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(sparkConf, conf,
    +        UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: java.lang.NoSuchMethodException =>
    +        logInfo("Hive Method not found " + e)
    +        None
    +      case e: java.lang.ClassNotFoundException =>
    +        logInfo("Hive Class not found " + e)
    +        None
    +      case e: java.lang.NoClassDefFoundError =>
    +        logDebug("Hive Class not found: " + e)
    --- End diff --
    
    This may not be class not found exception, but it could be an initialization exception, where the HiveConf or Hadoop Conf is not available during class contruction


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151183059
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44358/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43489033
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,28 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    +    val e = intercept[InvocationTargetException] {
    +      val token = util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice")
    +      fail(s"Expected an exception, got the token $token")
    +    }
    +    val inner = e.getCause
    +    if (inner == null) {
    +      fail("No inner cause", e)
    +    }
    +    if (!inner.isInstanceOf[HiveException]) {
    +      fail(s"Not a hive exception", inner)
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43233534
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,55 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    +    val e = intercept[InvocationTargetException] {
    --- End diff --
    
    minor, but you could use the same style as below (where you avoid the temp variable).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151808462
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43193976
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,99 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: Exception => {
    +        handleTokenIntrospectionFailure("Hive", e)
    +        None
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Handle failures to obtain a token through introspection. Failures to load the class are
    +   * not treated as errors: anything else is.
    +   * @param service service name for error messages
    +   * @param thrown exception caught
    +   * @throws Exception if the `thrown` exception isn't one that is to be ignored
    +   */
    +  private[yarn] def handleTokenIntrospectionFailure(service: String, thrown: Throwable): Unit = {
    +    thrown match {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"$service class not found $e")
    +        logDebug("Hive Class not found", e)
    +      case t: Throwable => {
    --- End diff --
    
    Why would you? All you're doing here is printing the stack trace to stderr, which will happen again when you re-throw the exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43189656
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,117 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: Exception => {
    +        handleTokenIntrospectionFailure("Hive", e)
    +        None
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Handle failures to obtain a token through introspection. Failures to load the class are
    +   * not treated as errors: anything else is.
    +   * @param service service name for error messages
    +   * @param thrown exception caught
    +   * @throws Exception if the `thrown` exception isn't one that is to be ignored
    +   */
    +  private[yarn] def handleTokenIntrospectionFailure(service: String, thrown: Throwable): Unit = {
    +    thrown match {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"$service class not found $e")
    +        logDebug("Hive Class not found", e)
    +      case e: NoClassDefFoundError =>
    +        logDebug(s"$service class not found", e)
    +      case e: InvocationTargetException =>
    +        // problem talking to the metastore or other hive-side exception
    +        logInfo(s"$service method invocation failed", e)
    +        throw if (e.getCause != null) e.getCause else e
    +      case e: ReflectiveOperationException =>
    +        // any other reflection failure log at error and rethrow
    +        logError(s"$service Class operation failed", e)
    +        throw e;
    --- End diff --
    
    well, that simplifies the clause, and the test...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r42952937
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,28 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    +    val e = intercept[InvocationTargetException] {
    +      val token = util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice")
    +      fail(s"Expected an exception, got the token $token")
    +    }
    +    val inner = e.getCause
    +    if (inner == null) {
    +      fail("No inner cause", e)
    +    }
    +    if (!inner.isInstanceOf[HiveException]) {
    +      fail(s"Not a hive exception", inner)
    +    }
    +    // expect exception trapping code to unwind this hive-side exception
    +    intercept[HiveException] {
    +      val token = util.obtainTokenForHiveMetastore(hadoopConf)
    +      fail(s"Expected an exception, got the token $token")
    --- End diff --
    
    I fail to see how the contents of `token` are useful in diagnosing anything. The fact that you didn't get the expected exception is the only thing being tested get.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151174376
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-152496951
  
    **[Test build #44676 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44676/consoleFull)** for PR 9232 at commit [`ebb2b5a`](https://github.com/apache/spark/commit/ebb2b5abdf26c9e0a72452a47f8cd23b09e4339c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `        logInfo(s\"Hive class not found $e\")`\n  * `        logDebug(\"Hive class not found\", e)`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r42944952
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,28 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    +    val e = intercept[InvocationTargetException] {
    +      val token = util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice")
    +      fail(s"Expected an exception, got the token $token")
    +    }
    +    val inner = e.getCause
    +    if (inner == null) {
    +      fail("No inner cause", e)
    --- End diff --
    
    good point


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43488942
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,81 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"Hive class not found $e")
    +        logDebug("Hive class not found", e)
    +        None
    +      case t: Throwable =>
    +        throw t
    +    }
    +  }
    +
    +  /**
    +   * Inner routine to obtains token for the Hive metastore; exceptions are raised on any problem.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this.
    +   * @param username the username of the principal requesting the delegating token.
    +   * @return a delegation token
    +   */
    +  private[yarn] def obtainTokenForHiveMetastoreInner(conf: Configuration,
    +      username: String): Option[Token[DelegationTokenIdentifier]] = {
    +    val mirror = universe.runtimeMirror(Utils.getContextOrSparkClassLoader)
    +
    +    // the hive configuration class is a subclass of Hadoop Configuration, so can be cast down
    +    // to a Configuration and used without reflection
    +    val hiveConfClass = mirror.classLoader.loadClass("org.apache.hadoop.hive.conf.HiveConf")
    +    // using the (Configuration, Class) constructor allows the current configuratin to be included
    +    // in the hive config.
    +    val ctor = hiveConfClass.getDeclaredConstructor(classOf[Configuration],
    +      classOf[Object].getClass)
    +    val hiveConf = ctor.newInstance(conf, hiveConfClass).asInstanceOf[Configuration]
    +    val metastoreUri = hiveConf.getTrimmed("hive.metastore.uris", "")
    +
    +    // Check for local metastore
    +    if (metastoreUri.nonEmpty) {
    +      require(username.nonEmpty, "Username undefined")
    +      val principalKey = "hive.metastore.kerberos.principal"
    +      val principal = hiveConf.getTrimmed(principalKey, "")
    +      require(principal.nonEmpty, "Hive principal $principalKey undefined")
    +      logDebug(s"Getting Hive delegation token for $username against $principal at $metastoreUri")
    +      val hiveClass = mirror.classLoader.loadClass("org.apache.hadoop.hive.ql.metadata.Hive")
    +      val closeCurrent = hiveClass.getMethod("closeCurrent")
    +      try {
    +        // get all the instance methods before invoking any
    +        val getDelegationToken = hiveClass.getMethod("getDelegationToken",
    +          classOf[String], classOf[String])
    +        val getHive = hiveClass.getMethod("get", hiveConfClass)
    +
    +        // invoke
    +        val hive = getHive.invoke(null, hiveConf)
    +        val tokenStr = getDelegationToken.invoke(hive, username, principal)
    +          .asInstanceOf[java.lang.String]
    +        val hive2Token = new Token[DelegationTokenIdentifier]()
    +        hive2Token.decodeFromUrlString(tokenStr)
    +        Some(hive2Token)
    +      } finally {
    +        try {
    +          closeCurrent.invoke(null)
    +        } catch {
    +          case e: Exception => logWarning("In Hive.closeCurrent()", e)
    --- End diff --
    
    `Utils.tryLogNonFatalError` looks cleaner; switching


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150684229
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150581810
  
    **[Test build #44223 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44223/consoleFull)** for PR 9232 at commit [`9630a9d`](https://github.com/apache/spark/commit/9630a9d80bc33f738dad6ebc841cb4aea058056d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151661827
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44469/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151655385
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150701247
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150366002
  
    **[Test build #44176 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44176/consoleFull)** for PR 9232 at commit [`1fcb42e`](https://github.com/apache/spark/commit/1fcb42ec819052f1c3dd70c7543eaa824c3be23c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150579433
  
    **[Test build #44222 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44222/consoleFull)** for PR 9232 at commit [`1248656`](https://github.com/apache/spark/commit/1248656b904dd9b2d965c89b57c1e96057b73640).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43488706
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
    @@ -1337,55 +1337,9 @@ object Client extends Logging {
           conf: Configuration,
           credentials: Credentials) {
         if (shouldGetTokens(sparkConf, "hive") && UserGroupInformation.isSecurityEnabled) {
    -      val mirror = universe.runtimeMirror(getClass.getClassLoader)
    -
    -      try {
    -        val hiveConfClass = mirror.classLoader.loadClass("org.apache.hadoop.hive.conf.HiveConf")
    -        val hiveConf = hiveConfClass.newInstance()
    -
    -        val hiveConfGet = (param: String) => Option(hiveConfClass
    -          .getMethod("get", classOf[java.lang.String])
    -          .invoke(hiveConf, param))
    -
    -        val metastore_uri = hiveConfGet("hive.metastore.uris")
    -
    -        // Check for local metastore
    -        if (metastore_uri != None && metastore_uri.get.toString.size > 0) {
    -          val hiveClass = mirror.classLoader.loadClass("org.apache.hadoop.hive.ql.metadata.Hive")
    -          val hive = hiveClass.getMethod("get").invoke(null, hiveConf.asInstanceOf[Object])
    -
    -          val metastore_kerberos_principal_conf_var = mirror.classLoader
    -            .loadClass("org.apache.hadoop.hive.conf.HiveConf$ConfVars")
    -            .getField("METASTORE_KERBEROS_PRINCIPAL").get("varname").toString
    -
    -          val principal = hiveConfGet(metastore_kerberos_principal_conf_var)
    -
    -          val username = Option(UserGroupInformation.getCurrentUser().getUserName)
    -          if (principal != None && username != None) {
    -            val tokenStr = hiveClass.getMethod("getDelegationToken",
    -              classOf[java.lang.String], classOf[java.lang.String])
    -              .invoke(hive, username.get, principal.get).asInstanceOf[java.lang.String]
    -
    -            val hive2Token = new Token[DelegationTokenIdentifier]()
    -            hive2Token.decodeFromUrlString(tokenStr)
    -            credentials.addToken(new Text("hive.server2.delegation.token"), hive2Token)
    -            logDebug("Added hive.Server2.delegation.token to conf.")
    -            hiveClass.getMethod("closeCurrent").invoke(null)
    -          } else {
    -            logError("Username or principal == NULL")
    -            logError(s"""username=${username.getOrElse("(NULL)")}""")
    -            logError(s"""principal=${principal.getOrElse("(NULL)")}""")
    -            throw new IllegalArgumentException("username and/or principal is equal to null!")
    -          }
    -        } else {
    -          logDebug("HiveMetaStore configured in localmode")
    -        }
    -      } catch {
    -        case e: java.lang.NoSuchMethodException => { logInfo("Hive Method not found " + e); return }
    -        case e: java.lang.ClassNotFoundException => { logInfo("Hive Class not found " + e); return }
    -        case e: Exception => { logError("Unexpected Exception " + e)
    -          throw new RuntimeException("Unexpected exception", e)
    -        }
    +      val util = new YarnSparkHadoopUtil()
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r42939246
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,28 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    +    val e = intercept[InvocationTargetException] {
    +      val token = util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice")
    +      fail(s"Expected an exception, got the token $token")
    +    }
    +    val inner = e.getCause
    +    if (inner == null) {
    +      fail("No inner cause", e)
    +    }
    +    if (!inner.isInstanceOf[HiveException]) {
    +      fail(s"Not a hive exception", inner)
    --- End diff --
    
    Again, use asserts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151809791
  
    **[Test build #44522 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44522/consoleFull)** for PR 9232 at commit [`217faba`](https://github.com/apache/spark/commit/217faba0d372ac66c57420372db62244e628da39).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150704233
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r42897746
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
    @@ -1337,55 +1337,8 @@ object Client extends Logging {
           conf: Configuration,
           credentials: Credentials) {
         if (shouldGetTokens(sparkConf, "hive") && UserGroupInformation.isSecurityEnabled) {
    --- End diff --
    
    As @tgravescs points out, the tokens are needed throughout the cluster, and yes, must be obtained irrespective of deployment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150695732
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-152785103
  
    Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by chesterxgchen <gi...@git.apache.org>.
Github user chesterxgchen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r42810729
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -337,5 +339,82 @@ object YarnSparkHadoopUtil {
           conf.getInt("spark.executor.instances", targetNumExecutors)
         }
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore. Exceptions are caught and logged
    +   */
    +  def obtainTokenForHiveMetastore(
    +      sparkConf: SparkConf,
    +      conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(sparkConf, conf,
    +        UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: java.lang.NoSuchMethodException =>
    +        logInfo("Hive Method not found " + e)
    +        None
    +      case e: java.lang.ClassNotFoundException =>
    +        logInfo("Hive Class not found " + e)
    +        None
    +      case e: java.lang.NoClassDefFoundError =>
    +        logDebug("Hive Class not found: " + e)
    --- End diff --
    
    I like the method return Option


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43231831
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,99 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: Exception => {
    +        handleTokenIntrospectionFailure("Hive", e)
    +        None
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Handle failures to obtain a token through introspection. Failures to load the class are
    +   * not treated as errors: anything else is.
    +   * @param service service name for error messages
    +   * @param thrown exception caught
    +   * @throws Exception if the `thrown` exception isn't one that is to be ignored
    +   */
    +  private[yarn] def handleTokenIntrospectionFailure(service: String, thrown: Throwable): Unit = {
    +    thrown match {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"$service class not found $e")
    +        logDebug("Hive Class not found", e)
    +      case t: Throwable => {
    --- End diff --
    
    Hi @steveloughran ,
    
    I think you're still not really getting what I'm saying. You can just *delete this whole `case`*. The exception will just propagate up the call stack


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150361708
  
    **[Test build #44176 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44176/consoleFull)** for PR 9232 at commit [`1fcb42e`](https://github.com/apache/spark/commit/1fcb42ec819052f1c3dd70c7543eaa824c3be23c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43478640
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,28 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    +    val e = intercept[InvocationTargetException] {
    +      val token = util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice")
    +      fail(s"Expected an exception, got the token $token")
    +    }
    +    val inner = e.getCause
    +    if (inner == null) {
    +      fail("No inner cause", e)
    +    }
    +    if (!inner.isInstanceOf[HiveException]) {
    +      fail(s"Not a hive exception", inner)
    --- End diff --
    
    nit: nothing to interpolate, can drop the `s`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151182699
  
    **[Test build #44358 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44358/consoleFull)** for PR 9232 at commit [`0356c4c`](https://github.com/apache/spark/commit/0356c4c6f9356ecd4234dbe6a6f1b2f7b38b5e14).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `        logInfo(s\"$service class not found $e\")`\n  * `        logDebug(s\"$service class not found\", e)`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150688788
  
    **[Test build #44257 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44257/consoleFull)** for PR 9232 at commit [`9713cf5`](https://github.com/apache/spark/commit/9713cf5c2f69a07f8ed03d64a0cd3956350ed232).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43193733
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,99 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: Exception => {
    +        handleTokenIntrospectionFailure("Hive", e)
    +        None
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Handle failures to obtain a token through introspection. Failures to load the class are
    +   * not treated as errors: anything else is.
    +   * @param service service name for error messages
    +   * @param thrown exception caught
    +   * @throws Exception if the `thrown` exception isn't one that is to be ignored
    +   */
    +  private[yarn] def handleTokenIntrospectionFailure(service: String, thrown: Throwable): Unit = {
    +    thrown match {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"$service class not found $e")
    +        logDebug("Hive Class not found", e)
    +      case t: Throwable => {
    --- End diff --
    
    Either?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43439849
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,55 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    +    val e = intercept[InvocationTargetException] {
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43160341
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,117 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: Exception => {
    +        handleTokenIntrospectionFailure("Hive", e)
    +        None
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Handle failures to obtain a token through introspection. Failures to load the class are
    +   * not treated as errors: anything else is.
    +   * @param service service name for error messages
    +   * @param thrown exception caught
    +   * @throws Exception if the `thrown` exception isn't one that is to be ignored
    +   */
    +  private[yarn] def handleTokenIntrospectionFailure(service: String, thrown: Throwable): Unit = {
    +    thrown match {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"$service class not found $e")
    +        logDebug("Hive Class not found", e)
    +      case e: NoClassDefFoundError =>
    +        logDebug(s"$service class not found", e)
    +      case e: InvocationTargetException =>
    +        // problem talking to the metastore or other hive-side exception
    +        logInfo(s"$service method invocation failed", e)
    +        throw if (e.getCause != null) e.getCause else e
    +      case e: ReflectiveOperationException =>
    +        // any other reflection failure log at error and rethrow
    +        logError(s"$service Class operation failed", e)
    +        throw e;
    --- End diff --
    
    I think unwinding just makes the code more confusing. Just handle the exceptions you really mean to handle, and let the others propagate as is. Errors here should be very uncommon, and the extra info in the stack trace won't really make it harder to find the case.
    
    I think it's very unlikely you'll get a `NoClassDefFoundError` if you don't get a `ClassDefNotFoundException`. If you do, it's probably an actual error - user has added some Hive jars to the path but not others, or something like that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150811907
  
    **[Test build #44296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44296/consoleFull)** for PR 9232 at commit [`89edfa6`](https://github.com/apache/spark/commit/89edfa649ef82e126f492df0d521dd7d3079ad76).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r42866295
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -337,5 +339,82 @@ object YarnSparkHadoopUtil {
           conf.getInt("spark.executor.instances", targetNumExecutors)
         }
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore. Exceptions are caught and logged
    +   */
    +  def obtainTokenForHiveMetastore(
    --- End diff --
    
    it does; if None is returned then it means that either the settings say "no need for a token" or the operation failed in a way (no hive on classpath) which was considered not a problem. The latest patch will throw an exception on things like hive metastore connectivity errors.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r42939231
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,104 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: NoSuchMethodException =>
    +        logInfo("Hive Method not found", e)
    +        None
    +      case e: ClassNotFoundException =>
    --- End diff --
    
    I kinda feel that everything but this exception should be propagated. That's really the only valid error you might expect; someone running Spark without Hive classes, and even that might change (I think there's a bug tracking removal of the hive profile from the maven build).
    
    Every other error means that either you have the wrong version of Hive in the classpath (user error) or that your configuration says you need delegation tokens, but you can't get them for some reason.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150700891
  
    I've tightened this up and am happy with the code now, except I'm not sure that NoSuchMethodExceptions should be downgraded. It will hide the situation of incompatible Hive version on the CP yet principal required to talk to the hive metastore. Provided the spark app doesn't want to talk to hive, it's not going to matter. But if the app does want't to talk to hive, there'll be no cue except for a log message at launch time, and failures in executors whenever they try to set up an RPC connection. Leaving it as is retains backwards compatibility though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43488794
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,81 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"Hive class not found $e")
    +        logDebug("Hive class not found", e)
    +        None
    +      case t: Throwable =>
    +        throw t
    +    }
    +  }
    +
    +  /**
    +   * Inner routine to obtains token for the Hive metastore; exceptions are raised on any problem.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this.
    +   * @param username the username of the principal requesting the delegating token.
    +   * @return a delegation token
    +   */
    +  private[yarn] def obtainTokenForHiveMetastoreInner(conf: Configuration,
    +      username: String): Option[Token[DelegationTokenIdentifier]] = {
    +    val mirror = universe.runtimeMirror(Utils.getContextOrSparkClassLoader)
    +
    +    // the hive configuration class is a subclass of Hadoop Configuration, so can be cast down
    +    // to a Configuration and used without reflection
    +    val hiveConfClass = mirror.classLoader.loadClass("org.apache.hadoop.hive.conf.HiveConf")
    +    // using the (Configuration, Class) constructor allows the current configuratin to be included
    +    // in the hive config.
    +    val ctor = hiveConfClass.getDeclaredConstructor(classOf[Configuration],
    +      classOf[Object].getClass)
    +    val hiveConf = ctor.newInstance(conf, hiveConfClass).asInstanceOf[Configuration]
    +    val metastoreUri = hiveConf.getTrimmed("hive.metastore.uris", "")
    +
    +    // Check for local metastore
    +    if (metastoreUri.nonEmpty) {
    +      require(username.nonEmpty, "Username undefined")
    +      val principalKey = "hive.metastore.kerberos.principal"
    +      val principal = hiveConf.getTrimmed(principalKey, "")
    +      require(principal.nonEmpty, "Hive principal $principalKey undefined")
    +      logDebug(s"Getting Hive delegation token for $username against $principal at $metastoreUri")
    +      val hiveClass = mirror.classLoader.loadClass("org.apache.hadoop.hive.ql.metadata.Hive")
    +      val closeCurrent = hiveClass.getMethod("closeCurrent")
    +      try {
    +        // get all the instance methods before invoking any
    +        val getDelegationToken = hiveClass.getMethod("getDelegationToken",
    +          classOf[String], classOf[String])
    +        val getHive = hiveClass.getMethod("get", hiveConfClass)
    +
    +        // invoke
    +        val hive = getHive.invoke(null, hiveConf)
    +        val tokenStr = getDelegationToken.invoke(hive, username, principal)
    +          .asInstanceOf[java.lang.String]
    --- End diff --
    
    copied from the original...cut that and joined the lines


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150655380
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43014259
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,117 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: Exception => {
    +        handleTokenIntrospectionFailure("Hive", e)
    +        None
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Handle failures to obtain a token through introspection. Failures to load the class are
    +   * not treated as errors: anything else is.
    +   * @param service service name for error messages
    +   * @param thrown exception caught
    +   * @throws Exception if the `thrown` exception isn't one that is to be ignored
    +   */
    +  private[yarn] def handleTokenIntrospectionFailure(service: String, thrown: Throwable): Unit = {
    +    thrown match {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"$service class not found $e")
    +        logDebug("Hive Class not found", e)
    +      case e: NoClassDefFoundError =>
    +        logDebug(s"$service class not found", e)
    +      case e: InvocationTargetException =>
    +        // problem talking to the metastore or other hive-side exception
    +        logInfo(s"$service method invocation failed", e)
    +        throw if (e.getCause != null) e.getCause else e
    +      case e: ReflectiveOperationException =>
    +        // any other reflection failure log at error and rethrow
    +        logError(s"$service Class operation failed", e)
    +        throw e;
    +      case e: RuntimeException =>
    +        // any runtime exception, including Illegal Argument Exception
    +        throw e
    +      case t: Throwable => {
    +        val msg = s"$service: Unexpected Exception " + t
    +        logError(msg, t)
    +        throw new RuntimeException(msg, t)
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Inner routine to obtains token for the Hive metastore; exceptions are raised on any problem.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this.
    +   * @param username the username of the principal requesting the delegating token.
    +   * @return a delegation token
    +   */
    +  private[yarn] def obtainTokenForHiveMetastoreInner(conf: Configuration,
    +      username: String): Option[Token[DelegationTokenIdentifier]] = {
    +    val mirror = universe.runtimeMirror(getClass.getClassLoader)
    +
    +    // the hive configuration class is a subclass of Hadoop Configuration, so can be cast down
    +    // to a Configuration and used without reflection
    +    val hiveConfClass = mirror.classLoader.loadClass("org.apache.hadoop.hive.conf.HiveConf")
    +    // using the (Configuration, Class) constructor allows the current configuratin to be included
    +    // in the hive config.
    +    val ctor = hiveConfClass.getDeclaredConstructor(classOf[Configuration],
    +      classOf[Object].getClass)
    +    val hiveConf = ctor.newInstance(conf, hiveConfClass).asInstanceOf[Configuration]
    +    val metastore_uri = hiveConf.getTrimmed("hive.metastore.uris", "")
    --- End diff --
    
    `metastoreUri`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-152334258
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43478556
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,31 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    --- End diff --
    
    `YarnSparkHadoopUtil.get`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r42897540
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
    @@ -322,8 +322,10 @@ private[spark] class Client(
         // multiple times, YARN will fail to launch containers for the app with an internal
         // error.
         val distributedUris = new HashSet[String]
    -    obtainTokenForHiveMetastore(sparkConf, hadoopConf, credentials)
    -    obtainTokenForHBase(sparkConf, hadoopConf, credentials)
    +    if (isClusterMode) {
    --- End diff --
    
    suggestion from @dougb , but yes, I didn't think that through. It's not for the driver, its for the nodes. Will fix


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150870381
  
    Is the PR description still accurate? It still says there's no fix here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43242989
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,55 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    +    val e = intercept[InvocationTargetException] {
    +      util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice")
    +    }
    +    assertNestedHiveException(e)
    +    // expect exception trapping code to unwind this hive-side exception
    +    assertNestedHiveException(intercept[InvocationTargetException] {
    +      util.obtainTokenForHiveMetastore(hadoopConf)
    +    })
    +  }
    +
    +  def assertNestedHiveException(e: InvocationTargetException): Throwable = {
    +    val inner = e.getCause
    +    if (inner == null) {
    +      fail("No inner cause", e)
    +    }
    +    if (!inner.isInstanceOf[HiveException]) {
    +      fail(s"Not a hive exception", inner)
    +    }
    +    inner
    +  }
    +
    +  test("handleTokenIntrospectionFailure") {
    +    val util = new YarnSparkHadoopUtil
    +    // downgraded exceptions
    +    util.handleTokenIntrospectionFailure("hive", new ClassNotFoundException("cnfe"))
    --- End diff --
    
    As soon as this patch is in I'll  turn to [SPARK-11317](https://issues.apache.org/jira/browse/SPARK-11317), which is essentially "apply the same catching, filtering and reporting strategy for HBase tokens as for Hive ones". It's not as critical as this one (token retrieval is working), but as nothing gets logged except "InvocationTargetException" with no stack trace, trying to recognise the issue is a Kerberos auth problem, let alone trying to fix it, is a weekend's effort, rather than 20 minutes worth. 
    
    Because the policy goes in both places, having it separate and re-usable makes it a zero-cut-and-paste reuse, with that single test for failures without having to mock up failures across two separate clauses. And future maintenance costs are kept down if someone ever decides to change the policy again.
    
    Would you be happier if I cleaned up the HBase code as part of this same patch? Because I can and it will make the benefits of the factored out behaviour clearer. It's just messy to fix two things in one patch, especially if someone ever needs to play cherry pick or reverting games.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-152453057
  
    Just some minor things left to clean up, otherwise looks ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150991666
  
    The description still says nothing is being fixed by this PR, is that still accurate? (Note title != description.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43195162
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,99 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: Exception => {
    +        handleTokenIntrospectionFailure("Hive", e)
    +        None
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Handle failures to obtain a token through introspection. Failures to load the class are
    +   * not treated as errors: anything else is.
    +   * @param service service name for error messages
    +   * @param thrown exception caught
    +   * @throws Exception if the `thrown` exception isn't one that is to be ignored
    +   */
    +  private[yarn] def handleTokenIntrospectionFailure(service: String, thrown: Throwable): Unit = {
    +    thrown match {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"$service class not found $e")
    +        logDebug("Hive Class not found", e)
    +      case t: Throwable => {
    --- End diff --
    
    I don't follow. You're catching an exception, logging it, and re-throwing it, which causes the exception to show up twice in the process output.
    
    Instead, you can just delete your code, and let the exception propagate naturally. It will show up in the output the same way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151673409
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151657516
  
    **[Test build #44469 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44469/consoleFull)** for PR 9232 at commit [`fbf0ecb`](https://github.com/apache/spark/commit/fbf0ecbd4ae4303846608193c91d735f536e9015).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-152815974
  
    Will do. Same JIRA or a new backport one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43577485
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,31 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    --- End diff --
    
    nah, it's ok to leave as is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-152489542
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150579870
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151661688
  
    **[Test build #44469 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44469/consoleFull)** for PR 9232 at commit [`fbf0ecb`](https://github.com/apache/spark/commit/fbf0ecbd4ae4303846608193c91d735f536e9015).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `        logInfo(s\"$service class not found $e\")`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150693568
  
    **[Test build #44257 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44257/consoleFull)** for PR 9232 at commit [`9713cf5`](https://github.com/apache/spark/commit/9713cf5c2f69a07f8ed03d64a0cd3956350ed232).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150701155
  
    **[Test build #44260 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44260/consoleFull)** for PR 9232 at commit [`bc6ea26`](https://github.com/apache/spark/commit/bc6ea26a139d377dcb8fbb27a02284b0b27454cf).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150701248
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44260/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43478370
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
    @@ -1337,55 +1337,9 @@ object Client extends Logging {
           conf: Configuration,
           credentials: Credentials) {
         if (shouldGetTokens(sparkConf, "hive") && UserGroupInformation.isSecurityEnabled) {
    -      val mirror = universe.runtimeMirror(getClass.getClassLoader)
    -
    -      try {
    -        val hiveConfClass = mirror.classLoader.loadClass("org.apache.hadoop.hive.conf.HiveConf")
    -        val hiveConf = hiveConfClass.newInstance()
    -
    -        val hiveConfGet = (param: String) => Option(hiveConfClass
    -          .getMethod("get", classOf[java.lang.String])
    -          .invoke(hiveConf, param))
    -
    -        val metastore_uri = hiveConfGet("hive.metastore.uris")
    -
    -        // Check for local metastore
    -        if (metastore_uri != None && metastore_uri.get.toString.size > 0) {
    -          val hiveClass = mirror.classLoader.loadClass("org.apache.hadoop.hive.ql.metadata.Hive")
    -          val hive = hiveClass.getMethod("get").invoke(null, hiveConf.asInstanceOf[Object])
    -
    -          val metastore_kerberos_principal_conf_var = mirror.classLoader
    -            .loadClass("org.apache.hadoop.hive.conf.HiveConf$ConfVars")
    -            .getField("METASTORE_KERBEROS_PRINCIPAL").get("varname").toString
    -
    -          val principal = hiveConfGet(metastore_kerberos_principal_conf_var)
    -
    -          val username = Option(UserGroupInformation.getCurrentUser().getUserName)
    -          if (principal != None && username != None) {
    -            val tokenStr = hiveClass.getMethod("getDelegationToken",
    -              classOf[java.lang.String], classOf[java.lang.String])
    -              .invoke(hive, username.get, principal.get).asInstanceOf[java.lang.String]
    -
    -            val hive2Token = new Token[DelegationTokenIdentifier]()
    -            hive2Token.decodeFromUrlString(tokenStr)
    -            credentials.addToken(new Text("hive.server2.delegation.token"), hive2Token)
    -            logDebug("Added hive.Server2.delegation.token to conf.")
    -            hiveClass.getMethod("closeCurrent").invoke(null)
    -          } else {
    -            logError("Username or principal == NULL")
    -            logError(s"""username=${username.getOrElse("(NULL)")}""")
    -            logError(s"""principal=${principal.getOrElse("(NULL)")}""")
    -            throw new IllegalArgumentException("username and/or principal is equal to null!")
    -          }
    -        } else {
    -          logDebug("HiveMetaStore configured in localmode")
    -        }
    -      } catch {
    -        case e: java.lang.NoSuchMethodException => { logInfo("Hive Method not found " + e); return }
    -        case e: java.lang.ClassNotFoundException => { logInfo("Hive Class not found " + e); return }
    -        case e: Exception => { logError("Unexpected Exception " + e)
    -          throw new RuntimeException("Unexpected exception", e)
    -        }
    +      val util = new YarnSparkHadoopUtil()
    --- End diff --
    
    This should be `YarnSparkHadoopUtil.get`; you could also avoid the `val` altogether.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r42866102
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
    @@ -1337,55 +1337,8 @@ object Client extends Logging {
           conf: Configuration,
           credentials: Credentials) {
         if (shouldGetTokens(sparkConf, "hive") && UserGroupInformation.isSecurityEnabled) {
    --- End diff --
    
    yeah, probably does. In fact, getting a delegation token there is potentially worse than useless, as that DT will eventually expire. Using the keytab direct will, provided the user keeps logging on, stay valid.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43201346
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,99 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: Exception => {
    +        handleTokenIntrospectionFailure("Hive", e)
    +        None
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Handle failures to obtain a token through introspection. Failures to load the class are
    +   * not treated as errors: anything else is.
    +   * @param service service name for error messages
    +   * @param thrown exception caught
    +   * @throws Exception if the `thrown` exception isn't one that is to be ignored
    +   */
    +  private[yarn] def handleTokenIntrospectionFailure(service: String, thrown: Throwable): Unit = {
    +    thrown match {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"$service class not found $e")
    +        logDebug("Hive Class not found", e)
    +      case t: Throwable => {
    --- End diff --
    
    oh, we're at cross purposes. I was looking @ line 128, above. you were at 180. That's why I Was confused. Yes, I'll cut the lower


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r42939244
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,28 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    +    val e = intercept[InvocationTargetException] {
    +      val token = util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice")
    +      fail(s"Expected an exception, got the token $token")
    +    }
    +    val inner = e.getCause
    +    if (inner == null) {
    +      fail("No inner cause", e)
    --- End diff --
    
    This is why we have asserts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by dougb <gi...@git.apache.org>.
Github user dougb commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r42809360
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
    @@ -1337,55 +1337,8 @@ object Client extends Logging {
           conf: Configuration,
           credentials: Credentials) {
         if (shouldGetTokens(sparkConf, "hive") && UserGroupInformation.isSecurityEnabled) {
    --- End diff --
    
    Does it make sense to add a check for `isClusterMode` too  ?
    I believe we don't need a delegation token if we are running in client mode.
    The driver will have access to the submitting user's kerberos tokens. 
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150589711
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44223/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43189502
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,117 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: Exception => {
    +        handleTokenIntrospectionFailure("Hive", e)
    +        None
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Handle failures to obtain a token through introspection. Failures to load the class are
    +   * not treated as errors: anything else is.
    +   * @param service service name for error messages
    +   * @param thrown exception caught
    +   * @throws Exception if the `thrown` exception isn't one that is to be ignored
    +   */
    +  private[yarn] def handleTokenIntrospectionFailure(service: String, thrown: Throwable): Unit = {
    +    thrown match {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"$service class not found $e")
    +        logDebug("Hive Class not found", e)
    +      case e: NoClassDefFoundError =>
    +        logDebug(s"$service class not found", e)
    +      case e: InvocationTargetException =>
    +        // problem talking to the metastore or other hive-side exception
    +        logInfo(s"$service method invocation failed", e)
    +        throw if (e.getCause != null) e.getCause else e
    +      case e: ReflectiveOperationException =>
    +        // any other reflection failure log at error and rethrow
    +        logError(s"$service Class operation failed", e)
    +        throw e;
    +      case e: RuntimeException =>
    +        // any runtime exception, including Illegal Argument Exception
    +        throw e
    +      case t: Throwable => {
    +        val msg = s"$service: Unexpected Exception " + t
    +        logError(msg, t)
    +        throw new RuntimeException(msg, t)
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Inner routine to obtains token for the Hive metastore; exceptions are raised on any problem.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this.
    +   * @param username the username of the principal requesting the delegating token.
    +   * @return a delegation token
    +   */
    +  private[yarn] def obtainTokenForHiveMetastoreInner(conf: Configuration,
    +      username: String): Option[Token[DelegationTokenIdentifier]] = {
    +    val mirror = universe.runtimeMirror(getClass.getClassLoader)
    +
    +    // the hive configuration class is a subclass of Hadoop Configuration, so can be cast down
    +    // to a Configuration and used without reflection
    +    val hiveConfClass = mirror.classLoader.loadClass("org.apache.hadoop.hive.conf.HiveConf")
    +    // using the (Configuration, Class) constructor allows the current configuratin to be included
    +    // in the hive config.
    +    val ctor = hiveConfClass.getDeclaredConstructor(classOf[Configuration],
    +      classOf[Object].getClass)
    +    val hiveConf = ctor.newInstance(conf, hiveConfClass).asInstanceOf[Configuration]
    +    val metastore_uri = hiveConf.getTrimmed("hive.metastore.uris", "")
    +
    +    // Check for local metastore
    +    if (metastore_uri.nonEmpty) {
    +      if (username.isEmpty) {
    +        throw new IllegalArgumentException(s"Username undefined")
    +      }
    +      val metastore_kerberos_principal_key = "hive.metastore.kerberos.principal"
    +      val principal = hiveConf.getTrimmed(metastore_kerberos_principal_key, "")
    +      if (principal.isEmpty) {
    +        throw new IllegalArgumentException(s"Hive principal" +
    +            s" $metastore_kerberos_principal_key undefined")
    +      }
    +      logDebug(s"Getting Hive delegation token for user $username against $metastore_uri")
    +      val hiveClass = mirror.classLoader.loadClass("org.apache.hadoop.hive.ql.metadata.Hive")
    +      val closeCurrent = hiveClass.getMethod("closeCurrent")
    +      try {
    +        // get all the instance methods before invoking any
    +        val getDelegationToken = hiveClass.getMethod("getDelegationToken",
    +          classOf[String], classOf[String])
    +        val getHive = hiveClass.getMethod("get", hiveConfClass)
    +
    +        // invoke
    +        val hive = getHive.invoke(null, hiveConf)
    +        val tokenStr = getDelegationToken.invoke(hive, username, principal)
    +            .asInstanceOf[java.lang.String]
    +        val hive2Token = new Token[DelegationTokenIdentifier]()
    +        hive2Token.decodeFromUrlString(tokenStr)
    +        Some(hive2Token)
    +      } finally {
    +        try {
    +          closeCurrent.invoke(null)
    +        } catch {
    +          case e: Exception => logWarning("In Hive.closeCurrent()", e)
    +        }
    +      }
    +    } else {
    +      logDebug("HiveMetaStore configured in localmode")
    --- End diff --
    
    I have no idea, but it's what the original code said.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43251011
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,55 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    +    val e = intercept[InvocationTargetException] {
    +      util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice")
    +    }
    +    assertNestedHiveException(e)
    +    // expect exception trapping code to unwind this hive-side exception
    +    assertNestedHiveException(intercept[InvocationTargetException] {
    +      util.obtainTokenForHiveMetastore(hadoopConf)
    +    })
    +  }
    +
    +  def assertNestedHiveException(e: InvocationTargetException): Throwable = {
    +    val inner = e.getCause
    +    if (inner == null) {
    +      fail("No inner cause", e)
    +    }
    +    if (!inner.isInstanceOf[HiveException]) {
    +      fail(s"Not a hive exception", inner)
    +    }
    +    inner
    +  }
    +
    +  test("handleTokenIntrospectionFailure") {
    +    val util = new YarnSparkHadoopUtil
    +    // downgraded exceptions
    +    util.handleTokenIntrospectionFailure("hive", new ClassNotFoundException("cnfe"))
    --- End diff --
    
    I think that because there's really only one exception that's currently interesting, you need more code to implement this "shared policy" approach than just catching the one interesting exception in each call site. It's true that if you need to modify the policy you'd need you'd need to duplicate code (or switch to your current approach), but then do you envision needing to do that? What if the policy for each service needs to be different?
    
    Personally I think that the current approach is a little confusing for someone reading the code (and inconsistent; for example the current code catches `Exception` and then feeds it to a method that matches on `Throwable`), and because the policy is so simple, the sharing argument doesn't justify making the code harder to follow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150585976
  
    **[Test build #44222 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44222/consoleFull)** for PR 9232 at commit [`1248656`](https://github.com/apache/spark/commit/1248656b904dd9b2d965c89b57c1e96057b73640).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150704237
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44264/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-152864441
  
    > Same JIRA or a new backport one?
    
    You could use the same one; I marked it as resolved but it's ok to reopen it for the backport.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43014505
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,117 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: Exception => {
    +        handleTokenIntrospectionFailure("Hive", e)
    +        None
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Handle failures to obtain a token through introspection. Failures to load the class are
    +   * not treated as errors: anything else is.
    +   * @param service service name for error messages
    +   * @param thrown exception caught
    +   * @throws Exception if the `thrown` exception isn't one that is to be ignored
    +   */
    +  private[yarn] def handleTokenIntrospectionFailure(service: String, thrown: Throwable): Unit = {
    +    thrown match {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"$service class not found $e")
    +        logDebug("Hive Class not found", e)
    +      case e: NoClassDefFoundError =>
    +        logDebug(s"$service class not found", e)
    +      case e: InvocationTargetException =>
    +        // problem talking to the metastore or other hive-side exception
    +        logInfo(s"$service method invocation failed", e)
    +        throw if (e.getCause != null) e.getCause else e
    +      case e: ReflectiveOperationException =>
    +        // any other reflection failure log at error and rethrow
    +        logError(s"$service Class operation failed", e)
    +        throw e;
    +      case e: RuntimeException =>
    +        // any runtime exception, including Illegal Argument Exception
    +        throw e
    +      case t: Throwable => {
    +        val msg = s"$service: Unexpected Exception " + t
    +        logError(msg, t)
    +        throw new RuntimeException(msg, t)
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Inner routine to obtains token for the Hive metastore; exceptions are raised on any problem.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this.
    +   * @param username the username of the principal requesting the delegating token.
    +   * @return a delegation token
    +   */
    +  private[yarn] def obtainTokenForHiveMetastoreInner(conf: Configuration,
    +      username: String): Option[Token[DelegationTokenIdentifier]] = {
    +    val mirror = universe.runtimeMirror(getClass.getClassLoader)
    +
    +    // the hive configuration class is a subclass of Hadoop Configuration, so can be cast down
    +    // to a Configuration and used without reflection
    +    val hiveConfClass = mirror.classLoader.loadClass("org.apache.hadoop.hive.conf.HiveConf")
    +    // using the (Configuration, Class) constructor allows the current configuratin to be included
    +    // in the hive config.
    +    val ctor = hiveConfClass.getDeclaredConstructor(classOf[Configuration],
    +      classOf[Object].getClass)
    +    val hiveConf = ctor.newInstance(conf, hiveConfClass).asInstanceOf[Configuration]
    +    val metastore_uri = hiveConf.getTrimmed("hive.metastore.uris", "")
    +
    +    // Check for local metastore
    +    if (metastore_uri.nonEmpty) {
    +      if (username.isEmpty) {
    +        throw new IllegalArgumentException(s"Username undefined")
    +      }
    +      val metastore_kerberos_principal_key = "hive.metastore.kerberos.principal"
    +      val principal = hiveConf.getTrimmed(metastore_kerberos_principal_key, "")
    +      if (principal.isEmpty) {
    +        throw new IllegalArgumentException(s"Hive principal" +
    +            s" $metastore_kerberos_principal_key undefined")
    +      }
    +      logDebug(s"Getting Hive delegation token for user $username against $metastore_uri")
    +      val hiveClass = mirror.classLoader.loadClass("org.apache.hadoop.hive.ql.metadata.Hive")
    +      val closeCurrent = hiveClass.getMethod("closeCurrent")
    +      try {
    +        // get all the instance methods before invoking any
    +        val getDelegationToken = hiveClass.getMethod("getDelegationToken",
    +          classOf[String], classOf[String])
    +        val getHive = hiveClass.getMethod("get", hiveConfClass)
    +
    +        // invoke
    +        val hive = getHive.invoke(null, hiveConf)
    +        val tokenStr = getDelegationToken.invoke(hive, username, principal)
    +            .asInstanceOf[java.lang.String]
    +        val hive2Token = new Token[DelegationTokenIdentifier]()
    +        hive2Token.decodeFromUrlString(tokenStr)
    +        Some(hive2Token)
    +      } finally {
    +        try {
    +          closeCurrent.invoke(null)
    +        } catch {
    +          case e: Exception => logWarning("In Hive.closeCurrent()", e)
    +        }
    +      }
    +    } else {
    +      logDebug("HiveMetaStore configured in localmode")
    --- End diff --
    
    "local mode"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150662642
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44244/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150811758
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43004016
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,28 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    +    val e = intercept[InvocationTargetException] {
    +      val token = util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice")
    +      fail(s"Expected an exception, got the token $token")
    --- End diff --
    
    I'm a firm believer in not losing information that may help debug failures. If you think token is worthless, I'll pull it, but in the other checks I will try and preserve stack traces, rather than raise new asserts


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by tedyu <gi...@git.apache.org>.
Github user tedyu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43583071
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,76 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"Hive class not found $e")
    +        logDebug("Hive class not found", e)
    --- End diff --
    
    Why double log the message ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r42939252
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,28 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    +    val e = intercept[InvocationTargetException] {
    +      val token = util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice")
    --- End diff --
    
    `token` is not used (also below).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150662640
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150577860
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150658089
  
    Patch with revert posted
    
    For reviewers, here some things to consider
    
    1. Which exceptions should trigger log & skip for token, vs which are critical enough to fail the process. The patch ignores no classpath found (`NoClassDefFoundError` and `ClassNotFoundException`) (i.e. hive not on the CP), but also `NoSuchMethodException`. Which means that even if the security settings require a token, if there's some method incompatibility, it will be ignored.
    1. the Hadoop config now passed down is used to help define the hive configuration used in the binding. This is needed for testing. Previously, an empty `HiveConf` instance was created: enough to force in the hive-default.xml and hive-site.xml, but not able to pick on any custom options. This is a change: does it constitute a regression, or a feature?
    
    Also, I've just realised the final`closeCurrent()` call must be in a finally(); will add


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r42881320
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
    @@ -322,8 +322,10 @@ private[spark] class Client(
         // multiple times, YARN will fail to launch containers for the app with an internal
         // error.
         val distributedUris = new HashSet[String]
    -    obtainTokenForHiveMetastore(sparkConf, hadoopConf, credentials)
    -    obtainTokenForHBase(sparkConf, hadoopConf, credentials)
    +    if (isClusterMode) {
    --- End diff --
    
    why is this cluster mode only?  I can run spark shell to access hive or hbase and this won't get tokens for those to ship to executors?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150579839
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9232


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151132336
  
    can you update jira and this pr to include which version of hive changed the interface?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-152325046
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43003603
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,28 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    +    val e = intercept[InvocationTargetException] {
    +      val token = util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice")
    --- End diff --
    
    It's used in the error text raised in the `fail()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43233453
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,55 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    +    val e = intercept[InvocationTargetException] {
    +      util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice")
    +    }
    +    assertNestedHiveException(e)
    +    // expect exception trapping code to unwind this hive-side exception
    +    assertNestedHiveException(intercept[InvocationTargetException] {
    +      util.obtainTokenForHiveMetastore(hadoopConf)
    +    })
    +  }
    +
    +  def assertNestedHiveException(e: InvocationTargetException): Throwable = {
    +    val inner = e.getCause
    +    if (inner == null) {
    +      fail("No inner cause", e)
    +    }
    +    if (!inner.isInstanceOf[HiveException]) {
    +      fail(s"Not a hive exception", inner)
    +    }
    +    inner
    +  }
    +
    +  test("handleTokenIntrospectionFailure") {
    +    val util = new YarnSparkHadoopUtil
    +    // downgraded exceptions
    +    util.handleTokenIntrospectionFailure("hive", new ClassNotFoundException("cnfe"))
    --- End diff --
    
    Following my previous comment, you could get rid of this whole test case if you just do exception handling in the caller method. If you really want to test that CNFE is ignored, you could use Mockito's `spy` to mock `obtainTokenForHiveMetastoreInner` and make it throw a CNFE. The other tests here are not really that interesting anymore.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150577826
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-152497071
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44676/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r42939248
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,28 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    +    val e = intercept[InvocationTargetException] {
    +      val token = util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice")
    +      fail(s"Expected an exception, got the token $token")
    +    }
    +    val inner = e.getCause
    +    if (inner == null) {
    +      fail("No inner cause", e)
    +    }
    +    if (!inner.isInstanceOf[HiveException]) {
    +      fail(s"Not a hive exception", inner)
    +    }
    +    // expect exception trapping code to unwind this hive-side exception
    +    intercept[HiveException] {
    +      val token = util.obtainTokenForHiveMetastore(hadoopConf)
    +      fail(s"Expected an exception, got the token $token")
    --- End diff --
    
    Again, that's what `intercept` is for.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150586251
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44222/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150579863
  
    -just pushed up a version with @dougb's suggestion; the check for cluster mode is made before trying to obtain either hive or hbase tokens.
    
    Note that the HBase token acquisition is very similar to the hive one, and could also be factored out, including sharing the same exception-handling function as the hive one. And again, testable, though that would need Hbase on the test classpath.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-11265 hive tokens

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150697465
  
    **[Test build #44260 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44260/consoleFull)** for PR 9232 at commit [`bc6ea26`](https://github.com/apache/spark/commit/bc6ea26a139d377dcb8fbb27a02284b0b27454cf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-151677560
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44475/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-152490177
  
    **[Test build #44676 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44676/consoleFull)** for PR 9232 at commit [`ebb2b5a`](https://github.com/apache/spark/commit/ebb2b5abdf26c9e0a72452a47f8cd23b09e4339c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43478447
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,81 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"Hive class not found $e")
    +        logDebug("Hive class not found", e)
    +        None
    +      case t: Throwable =>
    +        throw t
    +    }
    +  }
    +
    +  /**
    +   * Inner routine to obtains token for the Hive metastore; exceptions are raised on any problem.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this.
    +   * @param username the username of the principal requesting the delegating token.
    +   * @return a delegation token
    +   */
    +  private[yarn] def obtainTokenForHiveMetastoreInner(conf: Configuration,
    +      username: String): Option[Token[DelegationTokenIdentifier]] = {
    +    val mirror = universe.runtimeMirror(Utils.getContextOrSparkClassLoader)
    +
    +    // the hive configuration class is a subclass of Hadoop Configuration, so can be cast down
    +    // to a Configuration and used without reflection
    +    val hiveConfClass = mirror.classLoader.loadClass("org.apache.hadoop.hive.conf.HiveConf")
    +    // using the (Configuration, Class) constructor allows the current configuratin to be included
    +    // in the hive config.
    +    val ctor = hiveConfClass.getDeclaredConstructor(classOf[Configuration],
    +      classOf[Object].getClass)
    +    val hiveConf = ctor.newInstance(conf, hiveConfClass).asInstanceOf[Configuration]
    +    val metastoreUri = hiveConf.getTrimmed("hive.metastore.uris", "")
    +
    +    // Check for local metastore
    +    if (metastoreUri.nonEmpty) {
    +      require(username.nonEmpty, "Username undefined")
    +      val principalKey = "hive.metastore.kerberos.principal"
    +      val principal = hiveConf.getTrimmed(principalKey, "")
    +      require(principal.nonEmpty, "Hive principal $principalKey undefined")
    +      logDebug(s"Getting Hive delegation token for $username against $principal at $metastoreUri")
    +      val hiveClass = mirror.classLoader.loadClass("org.apache.hadoop.hive.ql.metadata.Hive")
    +      val closeCurrent = hiveClass.getMethod("closeCurrent")
    +      try {
    +        // get all the instance methods before invoking any
    +        val getDelegationToken = hiveClass.getMethod("getDelegationToken",
    +          classOf[String], classOf[String])
    +        val getHive = hiveClass.getMethod("get", hiveConfClass)
    +
    +        // invoke
    +        val hive = getHive.invoke(null, hiveConf)
    +        val tokenStr = getDelegationToken.invoke(hive, username, principal)
    +          .asInstanceOf[java.lang.String]
    --- End diff --
    
    minor: is `java.lang.` necessary here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-152786237
  
    Hi @steveloughran, this patch doesn't merge cleanly to branch-1.5. If you want to apply it there, could you send a new pr for that branch? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-152334259
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44637/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Stevel/patches/spark 11265 hive tokens

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9232#issuecomment-150360049
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r42945056
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,28 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    +    val e = intercept[InvocationTargetException] {
    +      val token = util.obtainTokenForHiveMetastoreInner(hadoopConf, "alice")
    +      fail(s"Expected an exception, got the token $token")
    +    }
    +    val inner = e.getCause
    +    if (inner == null) {
    +      fail("No inner cause", e)
    +    }
    +    if (!inner.isInstanceOf[HiveException]) {
    +      fail(s"Not a hive exception", inner)
    +    }
    +    // expect exception trapping code to unwind this hive-side exception
    +    intercept[HiveException] {
    +      val token = util.obtainTokenForHiveMetastore(hadoopConf)
    +      fail(s"Expected an exception, got the token $token")
    --- End diff --
    
    again, better diagnostics. Now, if you look at `Assertions.intercept()`, it's clearly discarding the result of the inner closure. If it  was redone to take an error string on a mismatch, and prepend that to the `.toString()` value of anything returned which wasn't of type `Unit`, then we'd have something really good at reporting failures to raise exceptions. As it is, `intercept` is pretty lossy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43488997
  
    --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala ---
    @@ -245,4 +247,31 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite with Matchers with Logging
           System.clearProperty("SPARK_YARN_MODE")
         }
       }
    +
    +  test("Obtain tokens For HiveMetastore") {
    +    val hadoopConf = new Configuration()
    +    hadoopConf.set("hive.metastore.kerberos.principal", "bob")
    +    // thrift picks up on port 0 and bails out, without trying to talk to endpoint
    +    hadoopConf.set("hive.metastore.uris", "http://localhost:0")
    +    val util = new YarnSparkHadoopUtil
    --- End diff --
    
    all the other tests do the same; if you want a switch it may as well be across the suite


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11265] [YARN] YarnClient can't get toke...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9232#discussion_r43014446
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
    @@ -142,6 +145,117 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
         val containerIdString = System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
         ConverterUtils.toContainerId(containerIdString)
       }
    +
    +  /**
    +   * Obtains token for the Hive metastore, using the current user as the principal.
    +   * Some exceptions are caught and downgraded to a log message.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this
    +   * @return a token, or `None` if there's no need for a token (no metastore URI or principal
    +   *         in the config), or if a binding exception was caught and downgraded.
    +   */
    +  def obtainTokenForHiveMetastore(conf: Configuration): Option[Token[DelegationTokenIdentifier]] = {
    +    try {
    +      obtainTokenForHiveMetastoreInner(conf, UserGroupInformation.getCurrentUser().getUserName)
    +    } catch {
    +      case e: Exception => {
    +        handleTokenIntrospectionFailure("Hive", e)
    +        None
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Handle failures to obtain a token through introspection. Failures to load the class are
    +   * not treated as errors: anything else is.
    +   * @param service service name for error messages
    +   * @param thrown exception caught
    +   * @throws Exception if the `thrown` exception isn't one that is to be ignored
    +   */
    +  private[yarn] def handleTokenIntrospectionFailure(service: String, thrown: Throwable): Unit = {
    +    thrown match {
    +      case e: ClassNotFoundException =>
    +        logInfo(s"$service class not found $e")
    +        logDebug("Hive Class not found", e)
    +      case e: NoClassDefFoundError =>
    +        logDebug(s"$service class not found", e)
    +      case e: InvocationTargetException =>
    +        // problem talking to the metastore or other hive-side exception
    +        logInfo(s"$service method invocation failed", e)
    +        throw if (e.getCause != null) e.getCause else e
    +      case e: ReflectiveOperationException =>
    +        // any other reflection failure log at error and rethrow
    +        logError(s"$service Class operation failed", e)
    +        throw e;
    +      case e: RuntimeException =>
    +        // any runtime exception, including Illegal Argument Exception
    +        throw e
    +      case t: Throwable => {
    +        val msg = s"$service: Unexpected Exception " + t
    +        logError(msg, t)
    +        throw new RuntimeException(msg, t)
    +      }
    +    }
    +  }
    +
    +  /**
    +   * Inner routine to obtains token for the Hive metastore; exceptions are raised on any problem.
    +   * @param conf hadoop configuration; the Hive configuration will be based on this.
    +   * @param username the username of the principal requesting the delegating token.
    +   * @return a delegation token
    +   */
    +  private[yarn] def obtainTokenForHiveMetastoreInner(conf: Configuration,
    +      username: String): Option[Token[DelegationTokenIdentifier]] = {
    +    val mirror = universe.runtimeMirror(getClass.getClassLoader)
    +
    +    // the hive configuration class is a subclass of Hadoop Configuration, so can be cast down
    +    // to a Configuration and used without reflection
    +    val hiveConfClass = mirror.classLoader.loadClass("org.apache.hadoop.hive.conf.HiveConf")
    +    // using the (Configuration, Class) constructor allows the current configuratin to be included
    +    // in the hive config.
    +    val ctor = hiveConfClass.getDeclaredConstructor(classOf[Configuration],
    +      classOf[Object].getClass)
    +    val hiveConf = ctor.newInstance(conf, hiveConfClass).asInstanceOf[Configuration]
    +    val metastore_uri = hiveConf.getTrimmed("hive.metastore.uris", "")
    +
    +    // Check for local metastore
    +    if (metastore_uri.nonEmpty) {
    +      if (username.isEmpty) {
    +        throw new IllegalArgumentException(s"Username undefined")
    +      }
    +      val metastore_kerberos_principal_key = "hive.metastore.kerberos.principal"
    +      val principal = hiveConf.getTrimmed(metastore_kerberos_principal_key, "")
    +      if (principal.isEmpty) {
    --- End diff --
    
    use `require`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org