You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/05/11 10:36:17 UTC

[GitHub] [hive] stoty opened a new pull request #2259: [WIP] HIVE-25101: Remove HBase libraries from HBase distribution

stoty opened a new pull request #2259:
URL: https://github.com/apache/hive/pull/2259


   ### What changes were proposed in this pull request?
   - Mark all HBase dependencies as provided, so that they are not include in the distribution package
   - Add the hbase mapredcp jars to AUX_PARAM in the hive startup script
   - Remove the code that adds the hbase dependencies to the MR/TEZ jobs automatically
   - Let HiveSparkClientFactory work even if it cannot access the HBase libraries
   
   ### Why are the changes needed?
   Hive currently includes the HBase libraries in its distribution package, and also adds the HBase libraries separately 
   vi the 'hbase mapredcp' command in the startup script.
   
   There are multiple problems with this:
   - It's redundant
   - The included HBase libraries are ancient, and cannot be easily updated 
   - The hbase-shaded-mapredcp.jar added by hbase mapredcp is incompatible with the included unshaded libraries on an API level, and the ordering of the classes, and thus the effective api can be different between the local and the MR/Tez classpath
   
   ### Does this PR introduce _any_ user-facing change?
   With this change, if the HBASE_HOME environment variable is not set, then the HBase libraries will not
   be available in Hive. This is change from the current behaviour.
   Also, with this change there is no need to manually replace the HBase libraries in the Hive distribution.
   
   ### How was this patch tested?
   I have built a local pseudodistributed cluster, and checked that the hbase handler works with these changes.
   AFAIK Hive does not have end-to-end tests that can test the generated package and start-up scripts.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] stoty commented on pull request #2259: HIVE-25101: Remove HBase libraries from Hive distribution

Posted by GitBox <gi...@apache.org>.
stoty commented on pull request #2259:
URL: https://github.com/apache/hive/pull/2259#issuecomment-857376643


   I have noticed one more thing while testing this change.
   The hive script changes will always overwrite the hbase.aux.jar.path configuration parameter.
   
   Now a lot of other settings, like having and auxjars directory, or setting the HIVE_AUX_JARS_PATH will do the same, 
   but this change will overwrite the hbase.aux.jar.path set in hbase-site.xml pretty much every single time.
   
   I'm not sure how much of a problem this is, but I wanted to give a heads-up.
   
   I could explore reverting to using the HADOOP_CLASSPATH instead, though I have doubts if that actually works for the distributed operations.
   
   @kgyrtkirk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] stoty commented on pull request #2259: HIVE-25101: Remove HBase libraries from Hive distribution

Posted by GitBox <gi...@apache.org>.
stoty commented on pull request #2259:
URL: https://github.com/apache/hive/pull/2259#issuecomment-852867330


   This is the difference between the llap yarn archive contents with and and without the patch:
   
   Only in new/lib: audience-annotations-0.5.0.jar
   Only in old/lib: commons-lang3-3.9.jar
   Only in new/lib: commons-logging-1.2.jar
   Only in old/lib: hadoop-common-3.1.0.jar
   Only in old/lib: hadoop-mapreduce-client-core-3.1.0.jar
   Only in old/lib: hbase-client-2.0.0-alpha4.jar
   Only in old/lib: hbase-common-2.0.0-alpha4.jar
   Only in old/lib: hbase-hadoop-compat-2.0.0-alpha4.jar
   Only in old/lib: hbase-hadoop2-compat-2.0.0-alpha4.jar
   Only in old/lib: hbase-mapreduce-2.0.0-alpha4.jar
   Only in old/lib: hbase-metrics-2.0.0-alpha4.jar
   Only in old/lib: hbase-metrics-api-2.0.0-alpha4.jar
   Only in old/lib: hbase-prefix-tree-2.0.0-alpha4.jar
   Only in old/lib: hbase-protocol-2.0.0-alpha4.jar
   Only in old/lib: hbase-protocol-shaded-2.0.0-alpha4.jar
   Only in old/lib: hbase-server-2.0.0-alpha4.jar
   Only in new/lib: hbase-shaded-mapreduce-2.4.2.jar
   Only in old/lib: hbase-shaded-miscellaneous-1.0.1.jar
   Only in old/lib: hbase-shaded-netty-1.0.1.jar
   Only in old/lib: hbase-shaded-protobuf-1.0.1.jar
   Only in old/lib: htrace-core-3.1.0-incubating.jar
   Only in new/lib: htrace-core4-4.2.0-incubating.jar
   Only in old/lib: metrics-core-3.1.0.jar
   Only in old/lib: zookeeper-3.5.5.jar
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kgyrtkirk merged pull request #2259: HIVE-25101: Remove HBase libraries from Hive distribution

Posted by GitBox <gi...@apache.org>.
kgyrtkirk merged pull request #2259:
URL: https://github.com/apache/hive/pull/2259


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] stoty edited a comment on pull request #2259: HIVE-25101: Remove HBase libraries from Hive distribution

Posted by GitBox <gi...@apache.org>.
stoty edited a comment on pull request #2259:
URL: https://github.com/apache/hive/pull/2259#issuecomment-842975995


   Here's what gets changed/removed from the distribution with the patch.
   
   Apart from the obvious HBase libraries, there are some HBase dependencies.
   The big one is Jersey 2.25. AFAICT, where Hive uses jersey, it includes the older com.sun.jersey artifacts from Hadoop, so this shouldn't be a problem. 
   I have added some missing explicit dependencies where the Hive source used undeclared dependencies for validation-api and commons-math3 .
   
   
   ```
   diff  <(cd hbase;find . |sort) <(cd nohbase;find . |sort)
   515d514
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/aopalliance-repackaged-2.5.0-b32.jar
   574c573
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/disruptor-3.3.6.jar
   ---
   > ./apache-hive-4.0.0-SNAPSHOT-bin/lib/disruptor-3.3.7.jar
   578d576
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/findbugs-annotations-1.3.9-1.jar
   584,602d581
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-client-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-common-2.0.0-alpha4-tests.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-common-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-hadoop-compat-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-hadoop2-compat-2.0.0-alpha4-tests.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-hadoop2-compat-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-http-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-mapreduce-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-metrics-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-metrics-api-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-prefix-tree-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-procedure-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-protocol-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-protocol-shaded-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-replication-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-server-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-shaded-miscellaneous-1.0.1.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-shaded-netty-1.0.1.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-shaded-protobuf-1.0.1.jar
   641,643d619
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hk2-api-2.5.0-b32.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hk2-locator-2.5.0-b32.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hk2-utils-2.5.0-b32.jar
   645c621
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/htrace-core-3.2.0-incubating.jar
   ---
   > ./apache-hive-4.0.0-SNAPSHOT-bin/lib/htrace-core-3.1.0-incubating.jar
   671d646
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/javax.inject-2.5.0-b32.jar
   680d654
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/jcodings-1.0.18.jar
   683,687d656
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/jersey-client-2.25.1.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/jersey-common-2.25.1.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/jersey-container-servlet-core-2.25.1.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/jersey-guava-2.25.1.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/jersey-media-jaxb-2.25.1.jar
   689d657
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/jersey-server-2.25.1.jar
   720d687
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/joni-2.1.11.jar
   777d743
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/osgi-resource-locator-1.0.1.jar
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] stoty edited a comment on pull request #2259: HIVE-25101: Remove HBase libraries from Hive distribution

Posted by GitBox <gi...@apache.org>.
stoty edited a comment on pull request #2259:
URL: https://github.com/apache/hive/pull/2259#issuecomment-852867330


   This is the difference between the llap yarn archive contents with and and without the patch:
   
   Removed:
   Only in old/lib: commons-lang3-3.9.jar
   Only in old/lib: hadoop-common-3.1.0.jar
   Only in old/lib: hadoop-mapreduce-client-core-3.1.0.jar
   Only in old/lib: hbase-client-2.0.0-alpha4.jar
   Only in old/lib: hbase-common-2.0.0-alpha4.jar
   Only in old/lib: hbase-hadoop-compat-2.0.0-alpha4.jar
   Only in old/lib: hbase-hadoop2-compat-2.0.0-alpha4.jar
   Only in old/lib: hbase-mapreduce-2.0.0-alpha4.jar
   Only in old/lib: hbase-metrics-2.0.0-alpha4.jar
   Only in old/lib: hbase-metrics-api-2.0.0-alpha4.jar
   Only in old/lib: hbase-prefix-tree-2.0.0-alpha4.jar
   Only in old/lib: hbase-protocol-2.0.0-alpha4.jar
   Only in old/lib: hbase-protocol-shaded-2.0.0-alpha4.jar
   Only in old/lib: hbase-server-2.0.0-alpha4.jar
   Only in old/lib: hbase-shaded-miscellaneous-1.0.1.jar
   Only in old/lib: hbase-shaded-netty-1.0.1.jar
   Only in old/lib: hbase-shaded-protobuf-1.0.1.jar
   Only in old/lib: htrace-core-3.1.0-incubating.jar
   Only in old/lib: metrics-core-3.1.0.jar
   Only in old/lib: zookeeper-3.5.5.jar
   
   Added:
   Only in new/lib: audience-annotations-0.5.0.jar
   Only in new/lib: commons-logging-1.2.jar
   Only in new/lib: hbase-shaded-mapreduce-2.4.2.jar
   Only in new/lib: htrace-core4-4.2.0-incubating.jar
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] stoty commented on pull request #2259: HIVE-25101: Remove HBase libraries from Hive distribution

Posted by GitBox <gi...@apache.org>.
stoty commented on pull request #2259:
URL: https://github.com/apache/hive/pull/2259#issuecomment-842975995


   Here's what gets changed/removed from the distribution with the patch.
   
   Apart from the obvious HBase libraries, there are some HBase dependencies.
   The big one is Jersey 2.25. AFAICT, where Hive uses jersey, it includes the older com.sun.jersey artifacts from Hadoop, so this shouldn't be a problem. 
   I have added some missing explicit dependencies where the Hive source used undeclared dependencies for validation-api and commons-math3 .
   
   
   diff  <(cd hbase;find . |sort) <(cd nohbase;find . |sort)
   515d514
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/aopalliance-repackaged-2.5.0-b32.jar
   574c573
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/disruptor-3.3.6.jar
   ---
   > ./apache-hive-4.0.0-SNAPSHOT-bin/lib/disruptor-3.3.7.jar
   578d576
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/findbugs-annotations-1.3.9-1.jar
   584,602d581
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-client-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-common-2.0.0-alpha4-tests.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-common-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-hadoop-compat-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-hadoop2-compat-2.0.0-alpha4-tests.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-hadoop2-compat-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-http-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-mapreduce-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-metrics-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-metrics-api-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-prefix-tree-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-procedure-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-protocol-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-protocol-shaded-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-replication-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-server-2.0.0-alpha4.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-shaded-miscellaneous-1.0.1.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-shaded-netty-1.0.1.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hbase-shaded-protobuf-1.0.1.jar
   641,643d619
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hk2-api-2.5.0-b32.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hk2-locator-2.5.0-b32.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/hk2-utils-2.5.0-b32.jar
   645c621
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/htrace-core-3.2.0-incubating.jar
   ---
   > ./apache-hive-4.0.0-SNAPSHOT-bin/lib/htrace-core-3.1.0-incubating.jar
   671d646
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/javax.inject-2.5.0-b32.jar
   680d654
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/jcodings-1.0.18.jar
   683,687d656
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/jersey-client-2.25.1.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/jersey-common-2.25.1.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/jersey-container-servlet-core-2.25.1.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/jersey-guava-2.25.1.jar
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/jersey-media-jaxb-2.25.1.jar
   689d657
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/jersey-server-2.25.1.jar
   720d687
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/joni-2.1.11.jar
   777d743
   < ./apache-hive-4.0.0-SNAPSHOT-bin/lib/osgi-resource-locator-1.0.1.jar


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org