You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by BruceXu1991 <gi...@git.apache.org> on 2017/12/20 13:15:26 UTC

[GitHub] spark pull request #20034: [SPARK-22846][SQL] Fix table owner is null when c...

GitHub user BruceXu1991 opened a pull request:

    https://github.com/apache/spark/pull/20034

    [SPARK-22846][SQL] Fix table owner is null when creating table through spark sql or thriftserver

    ## What changes were proposed in this pull request?
    fix table owner is null when create new table through spark sql
    
    ## How was this patch tested?
    manual test.  
    1、first create an table
    2、select the table properties in mysql of hive metastore 
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/BruceXu1991/spark SPARK-22846

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20034.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20034
    
----
commit e8c3035028e6242005806476f5ce7cbdad5af889
Author: xu.wenchun <xu...@...>
Date:   2017-12-20T13:05:13Z

    fix SPARK-22846

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20034: [SPARK-22846][SQL] Fix table owner is null when c...

Posted by BruceXu1991 <gi...@git.apache.org>.
Github user BruceXu1991 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20034#discussion_r158472814
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -186,7 +186,7 @@ private[hive] class HiveClientImpl(
       /** Returns the configuration for the current session. */
       def conf: HiveConf = state.getConf
     
    -  private val userName = state.getAuthenticator.getUserName
    +  private val userName = conf.getUser
    --- End diff --
    
    yes, i met this problem by using MySQL as Hive metastore.
    what's more, when I execute DESCRIBE FORMATTED spark_22846, NullPointerException will occur.
    
    '''
     > DESCRIBE FORMATTED offline.spark_22846;
    Error: java.lang.NullPointerException (state=,code=0)
    '''
    
    and the detail stack info:
    ```
    17/12/22 18:18:10 ERROR SparkExecuteStatementOperation: Error executing query, currentState RUNNING,
    java.lang.NullPointerException
            at scala.collection.immutable.StringOps$.length$extension(StringOps.scala:47)
            at scala.collection.immutable.StringOps.length(StringOps.scala:47)
            at scala.collection.IndexedSeqOptimized$class.isEmpty(IndexedSeqOptimized.scala:27)
            at scala.collection.immutable.StringOps.isEmpty(StringOps.scala:29)
            at scala.collection.TraversableOnce$class.nonEmpty(TraversableOnce.scala:111)
            at scala.collection.immutable.StringOps.nonEmpty(StringOps.scala:29)
            at org.apache.spark.sql.catalyst.catalog.CatalogTable.toLinkedHashMap(interface.scala:301)
            at org.apache.spark.sql.execution.command.DescribeTableCommand.describeFormattedTableInfo(tables.scala:559)
            at org.apache.spark.sql.execution.command.DescribeTableCommand.run(tables.scala:537)
            at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
            at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
            at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
            at org.apache.spark.sql.Dataset.<init>(Dataset.scala:183)
            at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:68)
            at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:767)
            at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:691)
    ```
    this result of NPE is that owner is null. The relevant source code is below:
    
    ```
    def toLinkedHashMap: mutable.LinkedHashMap[String, String] = {
    .........
    line 301: if (owner.nonEmpty) map.put("Owner", owner)
    ........
    }
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20034: [SPARK-22846][SQL] Fix table owner is null when c...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20034#discussion_r158501959
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -186,7 +186,7 @@ private[hive] class HiveClientImpl(
       /** Returns the configuration for the current session. */
       def conf: HiveConf = state.getConf
     
    -  private val userName = state.getAuthenticator.getUserName
    +  private val userName = conf.getUser
    --- End diff --
    
    do you know how Hive get the username internally?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20034: [SPARK-22846][SQL] Fix table owner is null when c...

Posted by BruceXu1991 <gi...@git.apache.org>.
Github user BruceXu1991 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20034#discussion_r158577749
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -186,7 +186,7 @@ private[hive] class HiveClientImpl(
       /** Returns the configuration for the current session. */
       def conf: HiveConf = state.getConf
     
    -  private val userName = state.getAuthenticator.getUserName
    +  private val userName = conf.getUser
    --- End diff --
    
    well, if using spark 2.2.1's current implementation
    ```
    private val userName = state.getAuthenticator.getUserName
    ```
    when the implementation of state.getAuthenticator is **HadoopDefaultAuthenticator**, which is default in hive conf, the username is got. 
    
    however, in the case that the implementation of state.getAuthenticator is **SessionStateUserAuthenticator**, which is used in my case, then username will be null.
    
    the simplified code below explains the reason:
    1) HadoopDefaultAuthenticator
    ```
    public class HadoopDefaultAuthenticator implements HiveAuthenticationProvider {
    @Override
      public String getUserName() {
        return userName;
      }
    
      @Override
      public void setConf(Configuration conf) {
        this.conf = conf;
        UserGroupInformation ugi = null;
        try {
          ugi = Utils.getUGI();
        } catch (Exception e) {
          throw new RuntimeException(e);
        }
        this.userName = ugi.getShortUserName();
        if (ugi.getGroupNames() != null) {
          this.groupNames = Arrays.asList(ugi.getGroupNames());
        }
      }
    }
    
    public class Utils {
      public static UserGroupInformation getUGI() throws LoginException, IOException {
        String doAs = System.getenv("HADOOP_USER_NAME");
        if(doAs != null && doAs.length() > 0) {
          return UserGroupInformation.createProxyUser(doAs, UserGroupInformation.getLoginUser());
        }
        return UserGroupInformation.getCurrentUser();
      }
    }
    ```
    it shows that HadoopDefaultAuthenticator will get username through Utils.getUGI(), so the username is HADOOP_USER_NAME of LoginUser.
    
    2)  SessionStateUserAuthenticator
    ```
    public class SessionStateUserAuthenticator implements HiveAuthenticationProvider {
      @Override
      public void setConf(Configuration arg0) {
      }
    
      @Override
      public String getUserName() {
        return sessionState.getUserName();
      }
    }
    ```
    it shows that SessionStateUserAuthenticator get the username through sessionState.getUserName(), which is null. Here is the [instantiation of SessionState in HiveClientImpl](https://github.com/apache/spark/blob/1cf3e3a26961d306eb17b7629d8742a4df45f339/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L187) 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20034: [SPARK-22846][SQL] Fix table owner is null when creating...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20034
  
    **[Test build #85270 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85270/testReport)** for PR 20034 at commit [`e8c3035`](https://github.com/apache/spark/commit/e8c3035028e6242005806476f5ce7cbdad5af889).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20034: [SPARK-22846][SQL] Fix table owner is null when c...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20034


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20034: [SPARK-22846][SQL] Fix table owner is null when c...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20034#discussion_r158333472
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -186,7 +186,7 @@ private[hive] class HiveClientImpl(
       /** Returns the configuration for the current session. */
       def conf: HiveConf = state.getConf
     
    -  private val userName = state.getAuthenticator.getUserName
    +  private val userName = conf.getUser
    --- End diff --
    
    So, does this happen in case of MySQL as Hive metastore?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20034: [SPARK-22846][SQL] Fix table owner is null when c...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20034#discussion_r158319646
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -186,7 +186,7 @@ private[hive] class HiveClientImpl(
       /** Returns the configuration for the current session. */
       def conf: HiveConf = state.getConf
     
    -  private val userName = state.getAuthenticator.getUserName
    --- End diff --
    
    Why this returns null?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20034: [SPARK-22846][SQL] Fix table owner is null when creating...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20034
  
    can you add a test?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20034: [SPARK-22846][SQL] Fix table owner is null when creating...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20034
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20034: [SPARK-22846][SQL] Fix table owner is null when creating...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20034
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85270/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20034: [SPARK-22846][SQL] Fix table owner is null when creating...

Posted by BruceXu1991 <gi...@git.apache.org>.
Github user BruceXu1991 commented on the issue:

    https://github.com/apache/spark/pull/20034
  
    @cloud-fan  @gatorsmile   could you review this issue?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20034: [SPARK-22846][SQL] Fix table owner is null when creating...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20034
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20034: [SPARK-22846][SQL] Fix table owner is null when creating...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20034
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20034: [SPARK-22846][SQL] Fix table owner is null when c...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20034#discussion_r158332740
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -186,7 +186,7 @@ private[hive] class HiveClientImpl(
       /** Returns the configuration for the current session. */
       def conf: HiveConf = state.getConf
     
    -  private val userName = state.getAuthenticator.getUserName
    +  private val userName = conf.getUser
    --- End diff --
    
    @BruceXu1991. I want to reproduce your problem here. Could you describe your environment more specifically? For me, 2.2.1 works like the following.
    ```scala
    scala> spark.version
    res0: String = 2.2.1
    
    scala> sql("CREATE TABLE spark_22846(a INT)")
    
    scala> sql("DESCRIBE FORMATTED spark_22846").show
    +--------------------+--------------------+-------+
    |            col_name|           data_type|comment|
    +--------------------+--------------------+-------+
    |                   a|                 int|   null|
    |                    |                    |       |
    |# Detailed Table ...|                    |       |
    |            Database|             default|       |
    |               Table|         spark_22846|       |
    |               Owner|            dongjoon|       |
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20034: [SPARK-22846][SQL] Fix table owner is null when creating...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20034
  
    **[Test build #85270 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85270/testReport)** for PR 20034 at commit [`e8c3035`](https://github.com/apache/spark/commit/e8c3035028e6242005806476f5ce7cbdad5af889).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20034: [SPARK-22846][SQL] Fix table owner is null when creating...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20034
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org