You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by weiqingy <gi...@git.apache.org> on 2017/05/27 05:37:45 UTC

[GitHub] spark pull request #18127: [SPARK-6628][SQL][Branch-2.1] Fix ClassCastExcept...

GitHub user weiqingy opened a pull request:

    https://github.com/apache/spark/pull/18127

    [SPARK-6628][SQL][Branch-2.1] Fix ClassCastException when executing sql statement 'insert into' on hbase table

    ## What changes were proposed in this pull request?
    
    The issue of SPARK-6628 is:
    ```
    org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat 
    ```
    cannot be cast to
    ```
    org.apache.hadoop.hive.ql.io.HiveOutputFormat
    ```
    The reason is:
    ```
    public interface HiveOutputFormat<K, V> extends OutputFormat<K, V> {…}
    
    public class HiveHBaseTableOutputFormat extends
        TableOutputFormat<ImmutableBytesWritable> implements
        OutputFormat<ImmutableBytesWritable, Object> {...}
    ```
    From the two snippets above, we can see both `HiveHBaseTableOutputFormat` and `HiveOutputFormat` `extends`/`implements` `OutputFormat`, and can not cast to each other. 
    
    For Spark 1.6, 2.0, 2.1, Spark initials the `outputFormat` in `SparkHiveWriterContainer`. For Spark 2.2+,  Spark initials the `outputFormat` in `HiveFileFormat`.
    ```
    @transient private lazy val outputFormat =
            jobConf.value.getOutputFormat.asInstanceOf[HiveOutputFormat[AnyRef, Writable]]
    ```
    `outputFormat` above has to be  `HiveOutputFormat`. However, when users insert data into hbase, the outputFormat is `HiveHBaseTableOutputFormat`, it isn't instance of `HiveOutputFormat`.
    
    This PR is to make `outputFormat` to be "null" when the `OutputFormat` is not an instance of `HiveOutputFormat`. This change should be safe since `outputFormat` is only used to get the file extension in function [`getFileExtension()`](https://github.com/apache/spark/blob/branch-2.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveWriterContainers.scala#L101). 
    
    We can also submit this PR to Master branch.
    
    ## How was this patch tested?
    Manually test.
    (1) create a HBase table with Hive:
    ```
    CREATE TABLE testwq100 (row_key string COMMENT 'from deserializer', application string COMMENT 'from deserializer', starttime timestamp COMMENT 'from deserializer', endtime timestamp COMMENT 'from deserializer', status string COMMENT 'from deserializer', statusid smallint COMMENT 'from deserializer',   insertdate timestamp COMMENT 'from deserializer', count int COMMENT 'from deserializer', errordesc string COMMENT 'from deserializer') ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'hbase.columns.mapping'='cf1:application,cf1:starttime,cf1:endtime,cf1:Status,cf1:StatusId,cf1:InsertDate,cf1:count,cf1:ErrorDesc', 'line.delim'='\\n',   'mapkey.delim'='\\u0003', 'serialization.format'='\\u0001') TBLPROPERTIES ('transient_lastDdlTime'='1489696241', 'hbase.table.name' = 'xyz', 'hbase.mapred.output.outputtable' = 'xyz')
    ```
    (2) verify:
    
    **Before:**
    
    Insert data into the Hbase table `testwq100` from Spark SQL:
    ```
    scala> sql(s"INSERT INTO testwq100 VALUES ('AA1M22','AA1M122','2011722','201156','Starte1d6',45,20,1,'ad1')")
    17/05/26 00:09:10 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
    java.lang.ClassCastException: org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to org.apache.hadoop.hive.ql.io.HiveOutputFormat
    	at org.apache.spark.sql.hive.SparkHiveWriterContainer.outputFormat$lzycompute(hiveWriterContainers.scala:82)
    	at org.apache.spark.sql.hive.SparkHiveWriterContainer.outputFormat(hiveWriterContainers.scala:81)
    	at org.apache.spark.sql.hive.SparkHiveWriterContainer.getOutputName(hiveWriterContainers.scala:101)
    	at org.apache.spark.sql.hive.SparkHiveWriterContainer.initWriters(hiveWriterContainers.scala:125)
    	at org.apache.spark.sql.hive.SparkHiveWriterContainer.executorSideSetup(hiveWriterContainers.scala:94)
    	at org.apache.spark.sql.hive.SparkHiveWriterContainer.writeToFile(hiveWriterContainers.scala:182)
    	at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:210)
    	at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:210)
    	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    	at org.apache.spark.scheduler.Task.run(Task.scala:99)
    	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    	at java.lang.Thread.run(Thread.java:745)
    17/05/26 00:09:10 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 3, localhost, executor driver): java.lang.ClassCastException: org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to org.apache.hadoop.hive.ql.io.HiveOutputFormat
    	at org.apache.spark.sql.hive.SparkHiveWriterContainer.outputFormat$lzycompute(hiveWriterContainers.scala:82)
    	at org.apache.spark.sql.hive.SparkHiveWriterContainer.outputFormat(hiveWriterContainers.scala:81)
    	at org.apache.spark.sql.hive.SparkHiveWriterContainer.getOutputName(hiveWriterContainers.scala:101)
    	at org.apache.spark.sql.hive.SparkHiveWriterContainer.initWriters(hiveWriterContainers.scala:125)
    	at org.apache.spark.sql.hive.SparkHiveWriterContainer.executorSideSetup(hiveWriterContainers.scala:94)
    	at org.apache.spark.sql.hive.SparkHiveWriterContainer.writeToFile(hiveWriterContainers.scala:182)
    	at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:210)
    	at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:210)
    	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    	at org.apache.spark.scheduler.Task.run(Task.scala:99)
    	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    	at java.lang.Thread.run(Thread.java:745)
    ```
    
    **After:**
    ```
    scala> sql(s"INSERT INTO testwq100 VALUES ('AA1M22','AA1M122','2011722','201156','Starte1d6',45,20,1,'ad1')")
    res2: org.apache.spark.sql.DataFrame = []
    
    scala> sql("select * from testwq100").show
    +-------+-----------+---------+-------+---------+--------+--------------------+-----+---------+
    |row_key|application|starttime|endtime|   status|statusid|          insertdate|count|errordesc|
    +-------+-----------+---------+-------+---------+--------+--------------------+-----+---------+
    |   AA1M|       AA1M|     null|   null| Starte1d|      45|                null|    1|      ad1|
    | AA1M22|    AA1M122|     null|   null|Starte1d6|      45|1970-01-01 00:00:...|    1|      ad1|
    +-------+-----------+---------+-------+---------+--------+--------------------+-----+---------+
    ```
    The ClassCastException gone. "Insert" succeed. 


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/weiqingy/spark SPARK_6628

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18127.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18127
    
----
commit 6a622b071fdf2e86e1849a4473cf9525e2ae3de0
Author: Weiqing Yang <ya...@gmail.com>
Date:   2017-05-27T04:27:42Z

    [SPARK-6628][SQL][Branch-2.1] Fix ClassCastException when executing sql statement 'insert into' on hbase table

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18127: [SPARK-6628][SQL][Branch-2.1] Fix ClassCastException whe...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18127
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18127: [SPARK-6628][SQL][Branch-2.1] Fix ClassCastExcept...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18127


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18127: [SPARK-6628][SQL][Branch-2.1] Fix ClassCastException whe...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18127
  
    **[Test build #77451 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77451/testReport)** for PR 18127 at commit [`6a622b0`](https://github.com/apache/spark/commit/6a622b071fdf2e86e1849a4473cf9525e2ae3de0).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18127: [SPARK-6628][SQL][Branch-2.1] Fix ClassCastException whe...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18127
  
    **[Test build #77451 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77451/testReport)** for PR 18127 at commit [`6a622b0`](https://github.com/apache/spark/commit/6a622b071fdf2e86e1849a4473cf9525e2ae3de0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18127: [SPARK-6628][SQL][Branch-2.1] Fix ClassCastException whe...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18127
  
    **[Test build #77449 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77449/testReport)** for PR 18127 at commit [`6a622b0`](https://github.com/apache/spark/commit/6a622b071fdf2e86e1849a4473cf9525e2ae3de0).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18127: [SPARK-6628][SQL][Branch-2.1] Fix ClassCastException whe...

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on the issue:

    https://github.com/apache/spark/pull/18127
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18127: [SPARK-6628][SQL][Branch-2.1] Fix ClassCastException whe...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18127
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77449/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18127: [SPARK-6628][SQL][Branch-2.1] Fix ClassCastException whe...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18127
  
    Hi @weiqingy, I just wonder if it is in progress in any way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18127: [SPARK-6628][SQL][Branch-2.1] Fix ClassCastException whe...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18127
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77451/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18127: [SPARK-6628][SQL][Branch-2.1] Fix ClassCastException whe...

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on the issue:

    https://github.com/apache/spark/pull/18127
  
    Thanks, @HyukjinKwon . Yes, but will come back here after I finish other work. Do I need to close this for now and reopen it at that time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18127: [SPARK-6628][SQL][Branch-2.1] Fix ClassCastException whe...

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on the issue:

    https://github.com/apache/spark/pull/18127
  
    I think the build errors are not related to this code changes:
    
    ```
    java.lang.RuntimeException: 1 fatal warnings
    	at scala.sys.package$.error(package.scala:27)
    	at SparkBuild$$anonfun$sharedSettings$18.apply(SparkBuild.scala:333)
    	at SparkBuild$$anonfun$sharedSettings$18.apply(SparkBuild.scala:302)
    	at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
    	at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:40)
    	at sbt.std.Transform$$anon$4.work(System.scala:63)
    	at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228)
    	at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:228)
    	at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:17)
    	at sbt.Execute.work(Execute.scala:237)
    	at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:228)
    	at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:228)
    	at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:159)
    	at sbt.CompletionService$$anon$2.call(CompletionService.scala:28)
    	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    	at java.lang.Thread.run(Thread.java:745)
    [error] (hive/compile:compile) 1 fatal warnings
    [error] Total time: 276 s, completed May 26, 2017 11:19:13 PM
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18127: [SPARK-6628][SQL][Branch-2.1] Fix ClassCastException whe...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18127
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18127: [SPARK-6628][SQL][Branch-2.1] Fix ClassCastException whe...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18127
  
    Thanks for your input @weiqingy. I was just trying to suggest to close PRs inactive for a month to review comments and/or non-successful Jenkins test result (for a good reason, of course). Would that take longer than a month?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18127: [SPARK-6628][SQL][Branch-2.1] Fix ClassCastException whe...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18127
  
    **[Test build #77449 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77449/testReport)** for PR 18127 at commit [`6a622b0`](https://github.com/apache/spark/commit/6a622b071fdf2e86e1849a4473cf9525e2ae3de0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org