You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/02/15 07:04:00 UTC

[jira] [Work logged] (HIVE-24706) Spark SQL access hive on HBase table access exception

     [ https://issues.apache.org/jira/browse/HIVE-24706?focusedWorklogId=845546&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-845546 ]

ASF GitHub Bot logged work on HIVE-24706:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 15/Feb/23 07:04
            Start Date: 15/Feb/23 07:04
    Worklog Time Spent: 10m 
      Work Description: alexdongli0829 opened a new pull request, #4063:
URL: https://github.com/apache/hive/pull/4063

   
   ### What changes were proposed in this pull request?
   
   For HIVE-24706, the main issue here is the HiveHbaseTableInput format implements two version of InputFormat, which make the spark cannot get the correct version correctly, and this is indeed not very clear implementation.
   
   So in this request, instead of directly extending TableInputFormatBase, I put it as a delegate which do the exactly the same as before, but avoid the confusing because the HbaseStorageHandler just need the old version InputFormat.
   
   In the long term, I think hive should update the storage handler instead of keep mixing these two different API versions
   
   
   ### Why are the changes needed?
   
   Its impacting the spark and hive compatible and reported by different uses in hive and spark
   
   
   ### Does this PR introduce _any_ user-facing change?
   There is configuration parameter added hive.hbase.inputformat.v2, so maybe need update doc to keep the end user informed
   
   
   ### How was this patch tested?
   
   create hbase table
   
   ```
   echo "create 'students','account','address'" | sudo -u hbase hbase shell -n
   echo "put 'students','student1','account:name','Alice'" |sudo -u hbase hbase shell -n
   ```
   
   create hive table
   ```
   hive -e "create external table test1 (key string, value string)
   > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   > with serdeproperties ('hbase.columns.mapping' = ':key,account:name')
   > tblproperties ('hbase.table.name' = 'students')"
   
   
   SLF4J: Class path contains multiple SLF4J bindings.
   Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j2.properties Async: true
   Hive Session ID = 05b4ec22-d15f-4614-9bc3-6c183e868728
   OK
   Time taken: 2.913 seconds
   ```
   
   Spark test:
   
   spark-sql --jars /usr/lib/hive/lib/hive-hbase-handler.jar,/usr/lib/hbase/hbase-common-2.4.4.jar,/usr/lib/hbase/hbase-client-2.4.4.jar,/usr/lib/hbase/lib/hbase-mapreduce-2.4.4.jar,/usr/lib/hbase/lib/shaded-clients/hbase-shaded-client-2.4.4.jar --conf spark.hive.hbase.inputformat.v2=true
   
   ```
   spark-sql> select * from test1;
   
   student1    Alice
   ```
   
   Unit Test
   
   ```
   [INFO] -------------------------------------------------------
   [INFO]  T E S T S
   [INFO] -------------------------------------------------------
   [INFO] Running org.apache.hadoop.hive.hbase.TestHiveHBaseTableInputFormatV2
   [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.069 s - in org.apache.hadoop.hive.hbase.TestHiveHBaseTableInputFormatV2
   [INFO]
   [INFO] Results:
   [INFO]
   [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
   ```
   




Issue Time Tracking
-------------------

            Worklog Id:     (was: 845546)
    Remaining Estimate: 0h
            Time Spent: 10m

> Spark SQL access hive on HBase table access exception
> -----------------------------------------------------
>
>                 Key: HIVE-24706
>                 URL: https://issues.apache.org/jira/browse/HIVE-24706
>             Project: Hive
>          Issue Type: Bug
>          Components: HBase Handler
>            Reporter: zhangzhanchang
>            Priority: Major
>         Attachments: image-2021-01-30-15-51-58-665.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hivehbasetableinputformat relies on two versions of inputformat,one is org.apache.hadoop.mapred.InputFormat, the other is org.apache.hadoop.mapreduce.InputFormat,Causes
> spark 3.0(https://github.com/apache/spark/pull/31302) both conditions to be true:
>  # classOf[oldInputClass[_, _]].isAssignableFrom(inputFormatClazz) is true
>  # classOf[newInputClass[_, _]].isAssignableFrom(inputFormatClazz) is true
> !image-2021-01-30-15-51-58-665.png|width=430,height=137!
> Hivehbasetableinputformat relies on inputformat to be changed to org.apache.hadoop.mapreduce or org.apache.hadoop.mapred?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)