You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/05/11 10:37:00 UTC

[jira] [Work logged] (HIVE-25101) Remove HBase libraries from HBase distribution

     [ https://issues.apache.org/jira/browse/HIVE-25101?focusedWorklogId=594463&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-594463 ]

ASF GitHub Bot logged work on HIVE-25101:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/May/21 10:36
            Start Date: 11/May/21 10:36
    Worklog Time Spent: 10m 
      Work Description: stoty opened a new pull request #2259:
URL: https://github.com/apache/hive/pull/2259


   ### What changes were proposed in this pull request?
   - Mark all HBase dependencies as provided, so that they are not include in the distribution package
   - Add the hbase mapredcp jars to AUX_PARAM in the hive startup script
   - Remove the code that adds the hbase dependencies to the MR/TEZ jobs automatically
   - Let HiveSparkClientFactory work even if it cannot access the HBase libraries
   
   ### Why are the changes needed?
   Hive currently includes the HBase libraries in its distribution package, and also adds the HBase libraries separately 
   vi the 'hbase mapredcp' command in the startup script.
   
   There are multiple problems with this:
   - It's redundant
   - The included HBase libraries are ancient, and cannot be easily updated 
   - The hbase-shaded-mapredcp.jar added by hbase mapredcp is incompatible with the included unshaded libraries on an API level, and the ordering of the classes, and thus the effective api can be different between the local and the MR/Tez classpath
   
   ### Does this PR introduce _any_ user-facing change?
   With this change, if the HBASE_HOME environment variable is not set, then the HBase libraries will not
   be available in Hive. This is change from the current behaviour.
   Also, with this change there is no need to manually replace the HBase libraries in the Hive distribution.
   
   ### How was this patch tested?
   I have built a local pseudodistributed cluster, and checked that the hbase handler works with these changes.
   AFAIK Hive does not have end-to-end tests that can test the generated package and start-up scripts.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 594463)
    Remaining Estimate: 0h
            Time Spent: 10m

> Remove HBase libraries from HBase distribution
> ----------------------------------------------
>
>                 Key: HIVE-25101
>                 URL: https://issues.apache.org/jira/browse/HIVE-25101
>             Project: Hive
>          Issue Type: Improvement
>          Components: HBase Handler, Hive
>    Affects Versions: 4.0.0
>            Reporter: Istvan Toth
>            Assignee: Istvan Toth
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive currently packages HBase libraries into its lib directory.
> It also adds the HBase libraries separately to its classpath in the hive startup script.
> Having both mechanisms is redundant, and it also causes errors, as the standard HBase libraries packaged into Hive are unshaded, while the libraries added by _hbase mapredcp_
> are shaded, and the two are NOT compatible when custom coprocessors are used, and in some cases the classpaths during local execution and for MR/TEZ jobs are mutually incompatible.
> I propose removing all HBase libraries from the distribution, and pulling them via the hbase mapredcp mechanism.
> This also solves the old problem of including ancient HBase alpha versions Hive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)