You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by "alexdongli0829 (via GitHub)" <gi...@apache.org> on 2023/02/15 07:03:59 UTC

[GitHub] [hive] alexdongli0829 opened a new pull request, #4063: HIVE-24706: add the HiveHBaseTableInputFormatV2 to fix the compatible issue with spark

alexdongli0829 opened a new pull request, #4063:
URL: https://github.com/apache/hive/pull/4063

   
   ### What changes were proposed in this pull request?
   
   For HIVE-24706, the main issue here is the HiveHbaseTableInput format implements two version of InputFormat, which make the spark cannot get the correct version correctly, and this is indeed not very clear implementation.
   
   So in this request, instead of directly extending TableInputFormatBase, I put it as a delegate which do the exactly the same as before, but avoid the confusing because the HbaseStorageHandler just need the old version InputFormat.
   
   In the long term, I think hive should update the storage handler instead of keep mixing these two different API versions
   
   
   ### Why are the changes needed?
   
   Its impacting the spark and hive compatible and reported by different uses in hive and spark
   
   
   ### Does this PR introduce _any_ user-facing change?
   There is configuration parameter added hive.hbase.inputformat.v2, so maybe need update doc to keep the end user informed
   
   
   ### How was this patch tested?
   
   create hbase table
   
   ```
   echo "create 'students','account','address'" | sudo -u hbase hbase shell -n
   echo "put 'students','student1','account:name','Alice'" |sudo -u hbase hbase shell -n
   ```
   
   create hive table
   ```
   hive -e "create external table test1 (key string, value string)
   > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   > with serdeproperties ('hbase.columns.mapping' = ':key,account:name')
   > tblproperties ('hbase.table.name' = 'students')"
   
   
   SLF4J: Class path contains multiple SLF4J bindings.
   Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j2.properties Async: true
   Hive Session ID = 05b4ec22-d15f-4614-9bc3-6c183e868728
   OK
   Time taken: 2.913 seconds
   ```
   
   Spark test:
   
   spark-sql --jars /usr/lib/hive/lib/hive-hbase-handler.jar,/usr/lib/hbase/hbase-common-2.4.4.jar,/usr/lib/hbase/hbase-client-2.4.4.jar,/usr/lib/hbase/lib/hbase-mapreduce-2.4.4.jar,/usr/lib/hbase/lib/shaded-clients/hbase-shaded-client-2.4.4.jar --conf spark.hive.hbase.inputformat.v2=true
   
   ```
   spark-sql> select * from test1;
   
   student1    Alice
   ```
   
   Unit Test
   
   ```
   [INFO] -------------------------------------------------------
   [INFO]  T E S T S
   [INFO] -------------------------------------------------------
   [INFO] Running org.apache.hadoop.hive.hbase.TestHiveHBaseTableInputFormatV2
   [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.069 s - in org.apache.hadoop.hive.hbase.TestHiveHBaseTableInputFormatV2
   [INFO]
   [INFO] Results:
   [INFO]
   [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sonarcloud[bot] commented on pull request #4063: HIVE-24706: add the HiveHBaseTableInputFormatV2 to fix the compatible issue with spark

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.
sonarcloud[bot] commented on PR #4063:
URL: https://github.com/apache/hive/pull/4063#issuecomment-1431409145

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4063)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4063&resolved=false&types=BUG) [![E](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/E-16px.png 'E')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4063&resolved=false&types=BUG) [7 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4063&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4063&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4063&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4063&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4063&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4063&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4063&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4063&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4063&resolved=false&types=CODE_SMELL) [26 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4063&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4063&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4063&metric=duplicated_lines_density&view=list) No Duplication information
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] TuroczyX commented on pull request #4063: HIVE-24706: add the HiveHBaseTableInputFormatV2 to fix the compatible issue with spark

Posted by "TuroczyX (via GitHub)" <gi...@apache.org>.
TuroczyX commented on PR #4063:
URL: https://github.com/apache/hive/pull/4063#issuecomment-1495706964

   I don't get it. As I see there is a test failure in the javadoc part, which seems related. So I would say yes it should be fixed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] alexdongli0829 closed pull request #4063: HIVE-24706: add the HiveHBaseTableInputFormatV2 to fix the compatible issue with spark

Posted by "alexdongli0829 (via GitHub)" <gi...@apache.org>.
alexdongli0829 closed pull request #4063: HIVE-24706: add the HiveHBaseTableInputFormatV2 to fix the compatible issue with spark
URL: https://github.com/apache/hive/pull/4063


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] alexdongli0829 commented on pull request #4063: HIVE-24706: add the HiveHBaseTableInputFormatV2 to fix the compatible issue with spark

Posted by "alexdongli0829 (via GitHub)" <gi...@apache.org>.
alexdongli0829 commented on PR #4063:
URL: https://github.com/apache/hive/pull/4063#issuecomment-1436152068

   Hey guys,
   
   I am looking into the BUGs which caused the test failure:
   
   1. Regarding the 2 Job not call close and 4 InterruptedException issue, I think I know what the report is saying, but is it indeed necessary to fix the bugs? Because most of the code is from the HiveHBaseTableInputFormat, why there is no test failure on HiveHBaseTableInputFormat?
   
   2. For the RecordReader, I need to return the RecordReader in this function, how I can close this in the getRecordReader function?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org