You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by yuchaoran2011 <gi...@git.apache.org> on 2018/05/22 15:20:38 UTC

[GitHub] spark pull request #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error i...

GitHub user yuchaoran2011 opened a pull request:

    https://github.com/apache/spark/pull/21398

    [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentry-secured cluster

    ## What changes were proposed in this pull request?
    
    As detailed in the JIRA [ticket](https://issues.apache.org/jira/browse/SPARK-24338), Hive "CREATE TABLE" statement would fail when Sentry is used to manage authorization. This PR fixed this bug by porting the Cloudera fix.
    
    ## How was this patch tested?
    
    Tested that the table can be successfully created with the fix. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yuchaoran2011/spark SPARK-24338

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21398.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21398
    
----
commit 5d37f999991071e719d9f3dee48d61f245f60f90
Author: Chaoran Yu <yu...@...>
Date:   2018-05-22T15:11:43Z

    Fixed Hive CREATETABLE error

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error i...

Posted by skonto <gi...@git.apache.org>.
Github user skonto commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21398#discussion_r190521452
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -230,11 +232,29 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         // specify location for managed table. And in [[CreateDataSourceTableAsSelectCommand]] we have
         // to create the table directory and write out data before we create this table, to avoid
         // exposing a partial written table.
    -    val needDefaultTableLocation = tableDefinition.tableType == MANAGED &&
    -      tableDefinition.storage.locationUri.isEmpty
    -
    -    val tableLocation = if (needDefaultTableLocation) {
    -      Some(CatalogUtils.stringToURI(defaultTablePath(tableDefinition.identifier)))
    +    //
    +    // When using a remote metastore, and if a managed table is being created with its
    +    // location explicitly set to the location where it would be created anyway, then do
    +    // not set its location explicitly. This avoids an issue with Sentry in secure clusters.
    +    // Otherwise, the above comment applies.
    --- End diff --
    
    >  The question is whether we want this workaround in Spark while Sentry is not fixed.
    We could open a bug for Sentry and since Spark uses hive which often comes with Sentry in enterprise envs then it would be good to point to that bug in the code , so when things get fixed we could change bug the code here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by yuchaoran2011 <gi...@git.apache.org>.
Github user yuchaoran2011 commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    I'm not confused about anything.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by skonto <gi...@git.apache.org>.
Github user skonto commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    @vanzin Could you pls review it? I know the fix exists on Cloudera's fork but right now since this is missing from the upstream project, it will affect all sentry users who dont use Cloudera's distro.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    **[Test build #90992 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90992/testReport)** for PR 21398 at commit [`0f4be31`](https://github.com/apache/spark/commit/0f4be3110df5f7a92313260872994bc18685332e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    **[Test build #90994 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90994/testReport)** for PR 21398 at commit [`5f64a95`](https://github.com/apache/spark/commit/5f64a95e9557feb3366cf2ac986de6da8ce165a4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    If that's the case then your user does not have permission to create the table. The issue that my patch works around is a bug in Sentry where creating a table *with the location specified to the same location as it would have been automatically assigned* does not work. My patch removes the explicit location in that case, and things work.
    
    But if you don't have permissions, you'll get an error regardless of my patch.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    Oh man, this thing. I didn't post this upstream because I think this is kinda hacky, and I also consider this primarily a Sentry bug. But if others are ok with it...
    
    Just make sure you have all the necessary changes. My initial patch caused a couple of breakages.
    https://github.com/cloudera/spark/commit/4c8289d
    https://github.com/cloudera/spark/commit/629b1c3
    https://github.com/cloudera/spark/commit/41fc8a1
    
    Also you probably shouldn't reference CDH bugs in the code here...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    I don't disagree with what you guys are saying, but I also don't have the context of why the original change was made.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error i...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21398#discussion_r190494840
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -230,11 +232,29 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         // specify location for managed table. And in [[CreateDataSourceTableAsSelectCommand]] we have
         // to create the table directory and write out data before we create this table, to avoid
         // exposing a partial written table.
    -    val needDefaultTableLocation = tableDefinition.tableType == MANAGED &&
    -      tableDefinition.storage.locationUri.isEmpty
    -
    -    val tableLocation = if (needDefaultTableLocation) {
    -      Some(CatalogUtils.stringToURI(defaultTablePath(tableDefinition.identifier)))
    +    //
    +    // When using a remote metastore, and if a managed table is being created with its
    +    // location explicitly set to the location where it would be created anyway, then do
    +    // not set its location explicitly. This avoids an issue with Sentry in secure clusters.
    +    // Otherwise, the above comment applies.
    --- End diff --
    
    > IIRC you can't change the location of the default database in Hive
    
    good to know that, maybe we should apply the same rule to Spark.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    I think you were confused by my example. It was not meant to be a workaround for Spark users (since currently Spark will just add the location if it's not there), it was meant to be used if creating the Sentry bug report ("this command should work but fails, and the corresponding command without the explicit location works").


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by blue666man <gi...@git.apache.org>.
Github user blue666man commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    To weigh in a bit, we're one of the larger enterprises that hit this bug, which we consider a regression going from Spark 1.6.3 (on Mesos, attempting to perform DDL to a Hive instance on CDH) to >= 2.1.x.  I think many more large enterprises will hit this with Spark on K8S to any Sentry-secured Hive databases.
    
    IMO, [this commit](https://github.com/apache/spark/commit/ce13c2672318242748f7520ed4ce6bcfad4fb428), while well intended in removing some of the hacks for 2.1.0, was a reversal of Spark's sort-of-implicit-commitment to have Sentry/Hive DDL work.  I agree this should be fixed upstream in Sentry too, but that would be a long wait for us:  creating tables in Mesos Spark 2.x is blocking 100's of devs from upgrading to Spark 2.
    
    Basically, the hacks / workarounds had existed in the past and were working.  Please add a workaround back until Sentry can fix the location issue on their side.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    (Although I'm pretty sure it will fail style checks.)
    
    Also, @cloud-fan since you wrote the original change.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error i...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21398#discussion_r190334757
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -230,11 +232,29 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         // specify location for managed table. And in [[CreateDataSourceTableAsSelectCommand]] we have
         // to create the table directory and write out data before we create this table, to avoid
         // exposing a partial written table.
    -    val needDefaultTableLocation = tableDefinition.tableType == MANAGED &&
    -      tableDefinition.storage.locationUri.isEmpty
    -
    -    val tableLocation = if (needDefaultTableLocation) {
    -      Some(CatalogUtils.stringToURI(defaultTablePath(tableDefinition.identifier)))
    +    //
    +    // When using a remote metastore, and if a managed table is being created with its
    +    // location explicitly set to the location where it would be created anyway, then do
    +    // not set its location explicitly. This avoids an issue with Sentry in secure clusters.
    +    // Otherwise, the above comment applies.
    --- End diff --
    
    > In general I think it's OK to always set the location for managed tables
    
    BTW I don't disagree with that and as I said I believe this is ultimately a bug in Sentry, not in Spark. The question is whether we want this workaround in Spark while Sentry is not fixed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by skonto <gi...@git.apache.org>.
Github user skonto commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    @vanzin any decision on this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by skonto <gi...@git.apache.org>.
Github user skonto commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    thank you @vanzin. Tricky thing to debug btw :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by yuchaoran2011 <gi...@git.apache.org>.
Github user yuchaoran2011 commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    You said "If that's the case then your user does not have permission to create the table." What I'm saying is the user did have the permission.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    I don't think working around a bug in Sentry is a good reason to add a hack in Spark. `HiveExternalCatalog` is already full of hacks...
    
    If you have a proposal to clean that up and can also work around the Sentry bug, that will be great.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    **[Test build #90992 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90992/testReport)** for PR 21398 at commit [`0f4be31`](https://github.com/apache/spark/commit/0f4be3110df5f7a92313260872994bc18685332e).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    You said: "even when LOCATION is not specified, Sentry would still not allow the table to be created."
    
    I said that if that's the case, then your user doesn't have the needed permission. Because all my patch does is remove the location.
    
    I'm still confused about what it is that you're confused about.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90994/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error i...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21398#discussion_r190307041
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -230,11 +232,29 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         // specify location for managed table. And in [[CreateDataSourceTableAsSelectCommand]] we have
         // to create the table directory and write out data before we create this table, to avoid
         // exposing a partial written table.
    -    val needDefaultTableLocation = tableDefinition.tableType == MANAGED &&
    -      tableDefinition.storage.locationUri.isEmpty
    -
    -    val tableLocation = if (needDefaultTableLocation) {
    -      Some(CatalogUtils.stringToURI(defaultTablePath(tableDefinition.identifier)))
    +    //
    +    // When using a remote metastore, and if a managed table is being created with its
    +    // location explicitly set to the location where it would be created anyway, then do
    +    // not set its location explicitly. This avoids an issue with Sentry in secure clusters.
    +    // Otherwise, the above comment applies.
    --- End diff --
    
    We have comments saying
    ```
    // We can't leave `locationUri` empty and count on Hive metastore to set a default table
    // location, because Hive metastore uses hive.metastore.warehouse.dir to generate default
    // table location for tables in default database, while we expect to use the location of
    // default database.
    ```
    
    In general I think it's OK to always set the location for managed tables, and I feel a little weird if some systems forbid it. Hive supports setting a location for managed tables, how shall we handle that?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by yuchaoran2011 <gi...@git.apache.org>.
Github user yuchaoran2011 commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    @vanzin But in Spark SQL, even when LOCATION is not specified, Sentry would still not allow the table to be created.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    I didn't make the original change so I'm refraining from making a decision about whether this should go in. Perhaps another fix is desired, perhaps the current behavior is desired, but either way this isn't an area where I'm comfortable making these kinds of decisions.
    
    (CDH is another thing altogether.)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by skonto <gi...@git.apache.org>.
Github user skonto commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    @vanzin should we merge this or add a comment in the docs?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error i...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21398#discussion_r190327819
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -230,11 +232,29 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         // specify location for managed table. And in [[CreateDataSourceTableAsSelectCommand]] we have
         // to create the table directory and write out data before we create this table, to avoid
         // exposing a partial written table.
    -    val needDefaultTableLocation = tableDefinition.tableType == MANAGED &&
    -      tableDefinition.storage.locationUri.isEmpty
    -
    -    val tableLocation = if (needDefaultTableLocation) {
    -      Some(CatalogUtils.stringToURI(defaultTablePath(tableDefinition.identifier)))
    +    //
    +    // When using a remote metastore, and if a managed table is being created with its
    +    // location explicitly set to the location where it would be created anyway, then do
    +    // not set its location explicitly. This avoids an issue with Sentry in secure clusters.
    +    // Otherwise, the above comment applies.
    --- End diff --
    
    I'm not sure I really follow that comment. IIRC you can't change the location of the default database in Hive, so I'm not sure how you'd hit a situation where "default db uri != warehouse dir", at least when using a remote metastore.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    Yes, that's exactly what I said. I'm not sure what part is confusing you.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90992/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    **[Test build #90994 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90994/testReport)** for PR 21398 at commit [`5f64a95`](https://github.com/apache/spark/commit/5f64a95e9557feb3366cf2ac986de6da8ce165a4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by yuchaoran2011 <gi...@git.apache.org>.
Github user yuchaoran2011 commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    I can double check. But I remember clearly that the user did have all the necessary privileges. Without this patch, a permission error was thrown. With this patch it worked fine. The CREATE TABLE statement didn't involve a LOCATION.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by skonto <gi...@git.apache.org>.
Github user skonto commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    @vanzin @cloud-fan. Going back to the facts. The fix/work-around in Cloudera's distro obviously exists because the distro integrates with Sentry anyway. Removing the location did fix the Spark Job at the customer side. To re-phrase the last request from @cloud-fan, if we remove the explicit location what are the consequences, when do you need the explicit location for managed tables? Can this be a default behavior in general?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by yuchaoran2011 <gi...@git.apache.org>.
Github user yuchaoran2011 commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    Thanks @vanzin. I've verified this commit has all the changes you mentioned. I've also removed references to CDH in the code comments.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21398: [SPARK-24338][SQL] Fixed Hive CREATETABLE error in Sentr...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21398
  
    I'm kinda neutral on what should happen, except maybe for filing a Sentry bug. I'll let @cloud-fan decide whether this makes sense in Spark.
    
    For the Sentry bug, there's an easy way to reproduce this, just run the following (either in Spark or using the old Hive CLI - not beeline nor jdbc):
    
    ```
    USE some_db;
    CREATE TABLE foo(idx int) LOCATION 'hdfs://blah.example.com/user/hive/warehouse/some_db.db/foo';
    ```
    
    Even if the user has permissions to create table on that database, Sentry will not allow it. Works fine if you remove the explicit location.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org