You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Bryan Beaudreault (Jira)" <ji...@apache.org> on 2022/06/22 23:36:00 UTC

[jira] [Resolved] (HBASE-26909) hbase-shaded-mapreduce and hbase-shaded-client expose some of the same classes

     [ https://issues.apache.org/jira/browse/HBASE-26909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Beaudreault resolved HBASE-26909.
---------------------------------------
    Resolution: Won't Fix

> hbase-shaded-mapreduce and hbase-shaded-client expose some of the same classes
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-26909
>                 URL: https://issues.apache.org/jira/browse/HBASE-26909
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Bryan Beaudreault
>            Assignee: Bryan Beaudreault
>            Priority: Major
>              Labels: patch-available
>
> We supply 2 primary artifacts for end-users to consume:
>  * hbase-shaded-client, which is for general use
>  * hbase-shaded-mapreduce, which is for use when you need to connect to hbase via mapreduce. For example, TableInputFormat
> The problem is that these artifacts expose tons of duplicate classes. One example (among many) is org.apache.hadoop.hbase.Cell, which appears in both jars.
> This may not be a problem if your projects are always very isolated – either doing mapreduce, or not. In that case you just depend in the one you need. Many users might exist in much more complicated environments where dependencies tend to bleed along more between projects. Here's an illustration:
>  * Imagine a project FooService, which includes two modules FooServiceRestWeb (for the rest http resources) and FooServiceData (which includes DAOs for accessing data). FooServiceRestWeb depends on FooServiceData to access hbase.  In this case, FooServiceData should depend on hbase-shaded-client.
>  * Now imagine another project FooPipeline, which has modules FooPipelineHadoop (with M/R jobs for processing data) and FooPipelineData (which has some DAOs for accessing data). In this case, FooPipelineData might depend on hbase-shaded-mapreduce since the context is intended for M/R.
>  * The problem arises when suddenly we want to include some data from FooService into our pipeline. The most straightforward way to achieve this is by depending on FooServiceData,  which has all of he DAOs for that data but also depends on hbase-shaded-client. At this point you have a problem, because FooPipelineHadoop now depends on both hbase-shaded-mapreduce and hbase-shaded-client.
> (Note, this obviously skirts around potential microservice solutions like only accessing FooService's data through the API... it's just for illustration, and it does come up.)
> From a plain java perspective, having these 2 jars on the classpath is somewhat wasteful but not a huge issue since the implementations are all the same.
> From a maven perspective, it's problematic because the maven dependency plugin will complain about the conflicting classes.
> One potential fix is to add exclusions to the FooServiceData dependency, to avoid pulling in hbase-shaded-client. This works on a one-off basis but is much more painful in a large and complicated environment where this may come up hundreds of times.
> A better fix in my opinion is to make hbase-shaded-mapreduce depend on hbase-shaded-client and then only expose the classes that aren't already exposed by the shaded client.
> [~busbey] also mentioned a BOM being a potential solution, but I don't have experience with that.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)