You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by weiqingy <gi...@git.apache.org> on 2017/05/16 00:28:41 UTC

[GitHub] spark pull request #17989: [SPARK-6628][SQL] Fix ClassCastException when exe...

GitHub user weiqingy opened a pull request:

    https://github.com/apache/spark/pull/17989

    [SPARK-6628][SQL] Fix ClassCastException when executing sql statement 'insert into' on hbase table

    ## What changes were proposed in this pull request?
    
    The major issue of SPARK-6628 is:
    ```
    org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat 
    ```
    cannot be cast to
    ```
    org.apache.hadoop.hive.ql.io.HiveOutputFormat
    ```
    The reason is:
    ```
    public interface HiveOutputFormat<K, V> extends OutputFormat<K, V> {…}
    
    public class HiveHBaseTableOutputFormat extends
        TableOutputFormat<ImmutableBytesWritable> implements
        OutputFormat<ImmutableBytesWritable, Object> {...}
    ```
    From the two snippets above, we can see both `HiveHBaseTableOutputFormat` and `HiveOutputFormat` `extends`/`implements` OutputFormat, and can not cast to each other. 
    
    Spark initials the `outputFormat` in `SparkHiveWriterContainer` of Spark 1.6, 2.0, 2.1 (or: in `HiveFileFormat` of Spark 2.2 /Master)
    ```
    @transient private lazy val outputFormat =
            jobConf.value.getOutputFormat.asInstanceOf[HiveOutputFormat[AnyRef, Writable]]
    ```
    Notice: this file output format is  `HiveOutputFormat`. However, when users write the data into the hbase, the outputFormat is `HiveHBaseTableOutputFormat`, it isn't instance of `HiveOutputFormat`.
    
    This PR is to make `outputFormat` to be "null" when the `OutputFormat` is not an instance of `HiveOutputFormat`. `outputFormat` is only used to get the file extension in function `getFileExtension()`. 
    
    Spark 2.x also has this issue. We can also submit this PR to Master branch.
    
    ## How was this patch tested?
    Manually test.
    
    **Before:**
    
    User was trying to write to a hive-hbase table from Spark SQL using hiveContext and failing with below error:
    ```
    17/03/30 20:26:08 INFO FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
    17/03/30 20:26:08 INFO ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x25acf50c46d05ce
    17/03/30 20:26:08 INFO ZooKeeper: Session: 0x25acf50c46d05ce closed
    17/03/30 20:26:08 INFO ClientCnxn: EventThread shut down
    17/03/30 20:26:08 INFO ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x35acf50c63305c7
    17/03/30 20:26:08 INFO ZooKeeper: Session: 0x35acf50c63305c7 closed
    17/03/30 20:26:08 INFO ClientCnxn: EventThread shut down
    17/03/30 20:26:08 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 5)
    java.lang.ClassCastException: org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to org.apache.hadoop.hive.ql.io.HiveOutputFormat
    	at org.apache.spark.sql.hive.SparkHiveWriterContainer.outputFormat$lzycompute(hiveWriterContainers.scala:74)
    	at org.apache.spark.sql.hive.SparkHiveWriterContainer.outputFormat(hiveWriterContainers.scala:73)
    	at org.apache.spark.sql.hive.SparkHiveWriterContainer.getOutputName(hiveWriterContainers.scala:93)
    	at org.apache.spark.sql.hive.SparkHiveWriterContainer.initWriters(hiveWriterContainers.scala:119)
    	at org.apache.spark.sql.hive.SparkHiveWriterContainer.executorSideSetup(hiveWriterContainers.scala:86)
    	at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:102)
    	at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:84)
    	at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:84)
    	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    	at org.apache.spark.scheduler.Task.run(Task.scala:89)
    	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    	at java.lang.Thread.run(Thread.java:745)
    17/03/30 20:26:08 WARN TaskSetManager: Lost task 0.0 in stage 5.0 (TID 5, localhost): java.lang.ClassCastException: org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to org.apache.hadoop.hive.ql.io.HiveOutputFormat
    ```
    
    Below is the create Table script :
    ```
    CREATE	TABLE `0bq_cntl.spark_load_cntl_stats`(  `row_key` string COMMENT 'from deserializer',
    `application` string COMMENT 'from deserializer',   `starttime` timestamp COMMENT 'from deserializer',
    `endtime` timestamp COMMENT 'from deserializer',   `status` string COMMENT 'from deserializer',
    `statusid` smallint COMMENT 'from deserializer',   `insertdate` timestamp COMMENT 'from deserializer',
    `count` int COMMENT 'from deserializer',   `errordesc` string COMMENT 'from deserializer')
    ROW FORMAT SERDE  
     'org.apache.hadoop.hive.hbase.HBaseSerDe' 
    STORED BY
      'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (   'hbase.columns.mapping'='cf1:application,cf1:starttime,cf1:endtime,cf1:Status,cf1:StatusId,cf1:InsertDate,cf1:count,cf1:ErrorDesc',
    'line.delim'='\n',   'mapkey.delim'='\u0003',   'serialization.format'='\u0001')TBLPROPERTIES (  'transient_lastDdlTime'='1489696241')
    ```
    Below is the query running using spark sql:
    ```
    val df=sqlContext.sql("Insert into table db1.spark_load_cntl_stats select 'AAM-846d55f6-0ffe-4694-b37a-1637a58f34f2','AAM','2017-03-21 04:03:01','2017-03-21 04:03:01','Started',45,'2017-03-21 04:03:01',1,'ad'")
    ```
    **After:**
    The ClassCastException gone. "Insert" succeed. 


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/weiqingy/spark SPARK-6628

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17989.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17989
    
----
commit 0fa2bb791d1fa9c37fe89c1942ce0ed950a9ee59
Author: Weiqing Yang <ya...@gmail.com>
Date:   2017-05-16T00:12:16Z

    [SPARK-6628][SQL] Fix ClassCastException when executing sql statement 'insert into' on hbase table

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17989: [SPARK-6628][SQL][Branch-1.6] Fix ClassCastException whe...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17989
  
    **[Test build #76952 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76952/consoleFull)** for PR 17989 at commit [`0fa2bb7`](https://github.com/apache/spark/commit/0fa2bb791d1fa9c37fe89c1942ce0ed950a9ee59).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17989: [SPARK-6628][SQL][Branch-1.6] Fix ClassCastException whe...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17989
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17989: [SPARK-6628][SQL][Branch-1.6] Fix ClassCastException whe...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17989
  
    **[Test build #77101 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77101/consoleFull)** for PR 17989 at commit [`0fa2bb7`](https://github.com/apache/spark/commit/0fa2bb791d1fa9c37fe89c1942ce0ed950a9ee59).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17989: [SPARK-6628][SQL][Branch-1.6] Fix ClassCastException whe...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/17989
  
    Retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17989: [SPARK-6628][SQL][Branch-1.6] Fix ClassCastException whe...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17989
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17989: [SPARK-6628][SQL][Branch-1.6] Fix ClassCastException whe...

Posted by freshghost <gi...@git.apache.org>.
Github user freshghost commented on the issue:

    https://github.com/apache/spark/pull/17989
  
    you priovde table script not to run successfully on the hive and  spark-sql on terminal. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17989: [SPARK-6628][SQL][Branch-1.6] Fix ClassCastException whe...

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on the issue:

    https://github.com/apache/spark/pull/17989
  
    Spark 1.6 is a little old. I'll move the change to branch-2.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17989: [SPARK-6628][SQL][Branch-1.6] Fix ClassCastExcept...

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy closed the pull request at:

    https://github.com/apache/spark/pull/17989


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17989: [SPARK-6628][SQL][Branch-1.6] Fix ClassCastExcept...

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17989#discussion_r116631706
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveWriterContainers.scala ---
    @@ -70,8 +70,11 @@ private[hive] class SparkHiveWriterContainer(
       @transient protected lazy val committer = conf.value.getOutputCommitter
       @transient protected lazy val jobContext = newJobContext(conf.value, jID.value)
       @transient private lazy val taskContext = newTaskAttemptContext(conf.value, taID.value)
    -  @transient private lazy val outputFormat =
    -    conf.value.getOutputFormat.asInstanceOf[HiveOutputFormat[AnyRef, Writable]]
    +  @transient private lazy val outputFormat = conf.value.getOutputFormat match {
    --- End diff --
    
    outputFormat is only used in `[Utilities.getFileExtension(conf.value, fileSinkConf.getCompressed, outputFormat)](https://github.com/weiqingy/spark/blob/0fa2bb791d1fa9c37fe89c1942ce0ed950a9ee59/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveWriterContainers.scala#L96)` to get the file extension. If `outputFormat` is "null", the extension will be "".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17989: [SPARK-6628][SQL][Branch-1.6] Fix ClassCastException whe...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17989
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77101/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17989: [SPARK-6628][SQL][Branch-1.6] Fix ClassCastException whe...

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on the issue:

    https://github.com/apache/spark/pull/17989
  
    Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17989: [SPARK-6628][SQL][Branch-1.6] Fix ClassCastException whe...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17989
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76952/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17989: [SPARK-6628][SQL][Branch-1.6] Fix ClassCastException whe...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17989
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17989: [SPARK-6628][SQL][Branch-1.6] Fix ClassCastException whe...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17989
  
    **[Test build #76953 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76953/consoleFull)** for PR 17989 at commit [`0fa2bb7`](https://github.com/apache/spark/commit/0fa2bb791d1fa9c37fe89c1942ce0ed950a9ee59).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17989: [SPARK-6628][SQL][Branch-1.6] Fix ClassCastException whe...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17989
  
    **[Test build #76952 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76952/consoleFull)** for PR 17989 at commit [`0fa2bb7`](https://github.com/apache/spark/commit/0fa2bb791d1fa9c37fe89c1942ce0ed950a9ee59).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17989: [SPARK-6628][SQL][Branch-1.6] Fix ClassCastException whe...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17989
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76953/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17989: [SPARK-6628][SQL][Branch-1.6] Fix ClassCastException whe...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17989
  
    **[Test build #77101 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77101/consoleFull)** for PR 17989 at commit [`0fa2bb7`](https://github.com/apache/spark/commit/0fa2bb791d1fa9c37fe89c1942ce0ed950a9ee59).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17989: [SPARK-6628][SQL][Branch-1.6] Fix ClassCastException whe...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17989
  
    **[Test build #76953 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76953/consoleFull)** for PR 17989 at commit [`0fa2bb7`](https://github.com/apache/spark/commit/0fa2bb791d1fa9c37fe89c1942ce0ed950a9ee59).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org