You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/01/08 19:17:03 UTC

[GitHub] [spark] DeyinZhong opened a new pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

DeyinZhong opened a new pull request #29178:
URL: https://github.com/apache/spark/pull/29178


   
   ### What changes were proposed in this pull request?
   The PR modify TableReader.scala to create OldHadoopRDD when inputformat is 'org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat', beacuse default NewHadoopRDD can not access hbase table.
   Reference link: https://issues.apache.org/jira/browse/SPARK-32380
   
   
   
   - environments:
   hadoop 2.8.5 
   hive 2.3.7 
   spark 3.0.0 
   hbase 1.4.9
   
   ### Why are the changes needed?
   When sparksql cannot access hive table while data in hbase will encounter abnormality, want to fixed this bug.
   
   ### Does this PR introduce _any_ user-facing change?
   no
   
   
   ### How was this patch tested?
   
   - step1: create hbase table
   
   ```
    hbase(main):001:0>create 'hbase_test1', 'cf1'
    hbase(main):001:0> put 'hbase_test', 'r1', 'cf1:c1', '123'
   
   ```
   
   
   - step2: create hive table related to hbase table
   
   hive> 
   ```
   CREATE EXTERNAL TABLE `hivetest.hbase_test`(
     `key` string COMMENT '', 
     `value` string COMMENT '')
   ROW FORMAT SERDE 
     'org.apache.hadoop.hive.hbase.HBaseSerDe' 
   STORED BY 
     'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
   WITH SERDEPROPERTIES ( 
     'hbase.columns.mapping'=':key,cf1:v1', 
     'serialization.format'='1')
   TBLPROPERTIES (
     'hbase.table.name'='hbase_test')
   ```
    
   
   - step3: sparksql query hive table while data in hbase
   
   `spark-sql --master yarn -e "select * from hivetest.hbase_test"`
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-756985069


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38443/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29178: [SPARK-32380] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-661880783


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29178: [SPARK-32380] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-661880783


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29178: [SPARK-32380] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-661932171


   ok to test


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29178: [SPARK-32380] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-661880089


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-688598014






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] premsagarreddy commented on pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
premsagarreddy commented on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-825436315


   @HyukjinKwon could you pls share the steps to resolve the issue spark3.0 access hive table while data in hbase problem 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-662031150






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29178: [SPARK-32380] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-661935354






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29178: [SPARK-32380] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-661931705


   Since this might be a regression due to SPARK-26630 , cc @gatorsmile and @HyukjinKwon , too.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-757026185


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133854/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #29178:
URL: https://github.com/apache/spark/pull/29178


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-756966751


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38443/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] premsagarreddy edited a comment on pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
premsagarreddy edited a comment on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-825436315


   @HyukjinKwon could you pls share the steps to resolve the spark3.0 access hive table while data in hbase problem 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-756950631


   **[Test build #133854 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133854/testReport)** for PR 29178 at commit [`d170e24`](https://github.com/apache/spark/commit/d170e2463cc7865f2907a861f6d77bbd3e92ca23).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29178: [SPARK-32380] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-661931705


   Since this seems to be a regression due to SPARK-26630 , cc @gatorsmile and @HyukjinKwon .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-757001905






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] DeyinZhong commented on a change in pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
DeyinZhong commented on a change in pull request #29178:
URL: https://github.com/apache/spark/pull/29178#discussion_r458500082



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
##########
@@ -299,7 +299,9 @@ class HadoopTableReader(
    */
   private def createHadoopRDD(localTableDesc: TableDesc, inputPathStr: String): RDD[Writable] = {
     val inputFormatClazz = localTableDesc.getInputFileFormatClass
-    if (classOf[newInputClass[_, _]].isAssignableFrom(inputFormatClazz)) {
+    if (!inputFormatClazz.getName.
+      equalsIgnoreCase("org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat")
+      && classOf[newInputClass[_, _]].isAssignableFrom(inputFormatClazz)) {

Review comment:
       I try Spark2.4.3 can work well.
   spark3.0.0 will createNewHadoopRDD, and call 'val allRowSplits = inputFormat.getSplits(new JobContextImpl(_conf, jobId)).asScala' in method getPartitions, this will call TableInputFormatBase.getSplits(JobContext context), but the variable table is null, so throw the execption; 
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-757026185


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133854/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-757001905


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38443/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29178: [SPARK-32380] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #29178:
URL: https://github.com/apache/spark/pull/29178#discussion_r458185884



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
##########
@@ -299,7 +299,9 @@ class HadoopTableReader(
    */
   private def createHadoopRDD(localTableDesc: TableDesc, inputPathStr: String): RDD[Writable] = {
     val inputFormatClazz = localTableDesc.getInputFileFormatClass
-    if (classOf[newInputClass[_, _]].isAssignableFrom(inputFormatClazz)) {
+    if (!inputFormatClazz.getName.
+      equalsIgnoreCase("org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat")
+      && classOf[newInputClass[_, _]].isAssignableFrom(inputFormatClazz)) {

Review comment:
       Do you think we can have a test case for this?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] premsagarreddy removed a comment on pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
premsagarreddy removed a comment on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-825436315


   @HyukjinKwon could you pls share the steps to resolve the spark3.0 access hive table while data in hbase problem 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-756950631


   **[Test build #133854 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133854/testReport)** for PR 29178 at commit [`d170e24`](https://github.com/apache/spark/commit/d170e2463cc7865f2907a861f6d77bbd3e92ca23).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-756950631






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun removed a comment on pull request #29178: [SPARK-32380] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun removed a comment on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-661934294


   BTW, @DeyinZhong . IIRC, Apache Spark doesn't support Hive Storage Handler officially. So, I guess `HBaseStorageHandler` (this PR) and `DruidStorageHandler` might be the same situation. In that case, we need more general solution.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-757009731


   **[Test build #133854 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133854/testReport)** for PR 29178 at commit [`d170e24`](https://github.com/apache/spark/commit/d170e2463cc7865f2907a861f6d77bbd3e92ca23).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yangBottle commented on a change in pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
yangBottle commented on a change in pull request #29178:
URL: https://github.com/apache/spark/pull/29178#discussion_r553872869



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
##########
@@ -299,7 +299,9 @@ class HadoopTableReader(
    */
   private def createHadoopRDD(localTableDesc: TableDesc, inputPathStr: String): RDD[Writable] = {
     val inputFormatClazz = localTableDesc.getInputFileFormatClass
-    if (classOf[newInputClass[_, _]].isAssignableFrom(inputFormatClazz)) {
+    if (!inputFormatClazz.getName.
+      equalsIgnoreCase("org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat")

Review comment:
       It looks like the new MapReduce API (`org.apache.hadoop.mapreduce`) used when creating NewHadoopRDD , but The getsplits method of HiveHBaseTableInputFormat is implemented by org.apache.hadoop.mapred API,so some initialization operations (Table、connection) are not done,so the obtained variable table is null.And when using methord createOldHadoopRDD will use the org.apache.hadoop.mapred API,and some initialization operations (Table、connection) are doing,so it can work well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #29178:
URL: https://github.com/apache/spark/pull/29178


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29178: [SPARK-32380] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-661935354






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-756950631


   **[Test build #133854 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133854/testReport)** for PR 29178 at commit [`d170e24`](https://github.com/apache/spark/commit/d170e2463cc7865f2907a861f6d77bbd3e92ca23).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29178:
URL: https://github.com/apache/spark/pull/29178#discussion_r458527958



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
##########
@@ -299,7 +299,9 @@ class HadoopTableReader(
    */
   private def createHadoopRDD(localTableDesc: TableDesc, inputPathStr: String): RDD[Writable] = {
     val inputFormatClazz = localTableDesc.getInputFileFormatClass
-    if (classOf[newInputClass[_, _]].isAssignableFrom(inputFormatClazz)) {
+    if (!inputFormatClazz.getName.
+      equalsIgnoreCase("org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat")

Review comment:
       Do you know why `org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat` implements new Hadoop inputformat interface but doesn't work?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #29178: [SPARK-32380] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-661934294


   BTW, @DeyinZhong . IIRC, Apache Spark doesn't support Hive Storage Handler officially. So, I guess `HBaseStorageHandler` (this PR) and `DruidStorageHandler` might be the same situation. In that case, we need more general solution.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29178: [SPARK-32380] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-661931705


   Since this might be a regression due to SPARK-26630 , cc @gatorsmile and @HyukjinKwon .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-662030652


   **[Test build #126261 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126261/testReport)** for PR 29178 at commit [`d170e24`](https://github.com/apache/spark/commit/d170e2463cc7865f2907a861f6d77bbd3e92ca23).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29178: [SPARK-32380] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-661880089


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29178: [SPARK-32380] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-661934576


   **[Test build #126261 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126261/testReport)** for PR 29178 at commit [`d170e24`](https://github.com/apache/spark/commit/d170e2463cc7865f2907a861f6d77bbd3e92ca23).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-757001905


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38443/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-662031150






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-688598014


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-688598014


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-747802142


   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yangBottle commented on a change in pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
yangBottle commented on a change in pull request #29178:
URL: https://github.com/apache/spark/pull/29178#discussion_r553872869



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
##########
@@ -299,7 +299,9 @@ class HadoopTableReader(
    */
   private def createHadoopRDD(localTableDesc: TableDesc, inputPathStr: String): RDD[Writable] = {
     val inputFormatClazz = localTableDesc.getInputFileFormatClass
-    if (classOf[newInputClass[_, _]].isAssignableFrom(inputFormatClazz)) {
+    if (!inputFormatClazz.getName.
+      equalsIgnoreCase("org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat")

Review comment:
       It looks like the new MapReduce API (`org.apache.hadoop.mapreduce`) used when creating NewHadoopRDD , but The getsplits method of HiveHBaseTableInputFormat is implemented by org.apache.hadoop.mapred API,so some initialization operations (Table、connection) are not done,so the obtained variable table is null.And when using methord createOldHadoopRDD will use the org.apache.hadoop.mapred API,and some initialization operations (Table、connection) are doing,so it can work well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29178: [SPARK-32380][SQL] fixed spark3.0 access hive table while data in hbase problem

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29178:
URL: https://github.com/apache/spark/pull/29178#issuecomment-661934576


   **[Test build #126261 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126261/testReport)** for PR 29178 at commit [`d170e24`](https://github.com/apache/spark/commit/d170e2463cc7865f2907a861f6d77bbd3e92ca23).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org